Methods for genetic analysis

ABSTRACT

Methods are described for assessing an individual&#39;s likelihood of developing or exhibiting a multifactorial trait, and for predicting the effectiveness of a drug treatment regimen in an individual. The methods include determining a plurality of genotypes for the individual at a plurality of biallelic polymorphic loci, using the genotypes to compute a score for the individual, and comparing the score to at least one threshold value. Genetic tests are also described for assessing an individual&#39;s likelihood of developing or exhibiting a multifactorial trait.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. application Ser. No.10/956,224, filed Sep. 30, 2004, entitled “Methods for GeneticAnalysis,” now allowed, which claims the benefit of U.S. ProvisionalApplication No. 60/550,662, filed Mar. 5, 2004, entitled “Use ofHigh-density Whole Genome Scanning to Select Individuals GeneticallyPredisposed to Adverse Events”, U.S. Provisional Application No.60/566,302, filed Apr. 28, 2004, entitled “Methods for GeneticAnalysis”, and U.S. Provisional Application No. 60/590,534, filed Jul.22, 2004, entitled “Methods for Genetic Analysis”, all of which areincorporated by reference herein in their entireties for all purposes.

BACKGROUND OF THE INVENTION

The DNA that makes up human chromosomes provides the instructions thatdirect the production of all proteins in the body. These proteins carryout the vital functions of life. Variations in the sequence of DNAencoding a protein produce variations or mutations in the proteinsencoded, thus affecting the normal function of cells. Althoughenvironment often plays a significant role in disease, variations and/ormutations in the DNA of an individual are directly related to almost allhuman diseases, including cardiovascular, metabolic and infectiousdisease, cancer, and autoimmune disorders. Moreover, knowledge ofgenetics, particularly human genetics, has led to the realization thatmany diseases result from either complex interactions of several genesor their products. For example, Type I and II diabetes have been linkedto multiple genes, each with its own pattern of mutations.

Additionally, knowledge of human genetics has led to a limitedunderstanding of variations between individuals when it comes to drugresponse—the field of pharmocogenetics. Over half a century ago, adversedrug responses were correlated with amino acid variations in twodrug-metabolizing enzymes, plasma cholinesterase and glucose-6-phosphatedehydrogenase. Since then, careful genetic analyses have linked sequencepolymorphisms (variations) in over 35 drug metabolism enzymes, 25 drugtargets and 5 drug transporters with compromised levels of drug efficacyor safety (Evans and Relling, Science 296:487-91 (1999)). In the clinic,such information is being used to prevent drug toxicity; for example,patients are screened routinely for genetic differences in thethiopurine methyltransferase gene that cause decreased metabolism of6-mercaptopurine or azathiopurine. Yet only a small percentage ofobserved drug toxicities have been explained adequately by the set ofpharmacogenetic markers validated to date. Even more common thantoxicity issues may be cases where drugs demonstrated to be safe and/orefficacious for some individuals have been found to have eitherinsufficient therapeutic efficacy or unanticipated side effects in otherindividuals.

Because any two humans are 99.9% similar in their genetic makeup, mostof the sequence of the DNA of their genomes is identical. However, thereare variations in DNA sequence between individuals. For example, thereare deletions of many-base stretches of DNA, insertion of stretches ofDNA, variations in the number of repetitive DNA elements in coding ornon-coding regions, and changes in single nitrogenous base positions inthe genome called “single nucleotide polymorphisms” (SNPs). Human DNAsequence variation accounts for a large fraction of observed differencesbetween individuals, including susceptibility or resistance to diseaseand how an individual will respond to a particular therapeutic ortreatment regimen.

Multifactorial traits, or complex traits, are influenced by multiplefactors, such as genes, environmental factors, and their interactions.Often, more than one combination of genetic and/or environmental factorswill result in the same multifactorial trait, and this complexity makesit difficult to determine who will develop such a trait. Further, thecontribution of each factor is typically not identical to thecontributions of every other factor. That is, for example, some factorsmay have a very strong contribution while others may have a very weakcontribution. To complicate the biological basis of multifactorialtraits even more, the contributions of a factor may be additive,synergistic, or completely independent from the contribution of anyother factor. Some complex traits manifest common diseases, such ascardiovascular disease, diabetes, obesity, and high cholesterol. Othercomplex traits include such phenotypes as the way in which an individualresponds to a drug or other medical treatment regimen.

In the recent past, research into the genetic basis for disease hasresulted in the development of a few genetic tests for diseases.However, these genetic tests will not be useful for predicting a healthyperson's probability of developing a common multifactorial disease. Manyargue that genetic testing for common multifactorial traits (e.g.diseases) will not be useful in practice due to the incompletepenetrance and low individual contribution of each gene involved(Holtzman and Marteau, 2000; Vineis et al. 2001). However, thesearguments are based in large part on the use of single loci to predictwhether or not an individual will exhibit the trait (Beaudet 1999; Evanset al. 2001). What is needed is a reliable approach for determining anindividual's risk of developing or exhibiting a multifactorial traitthat is based on the individual's genotype at a plurality of loci, eachof which are factors in the manifestation of the multifactorial trait.

SUMMARY

The present application discloses methods for determining anindividual's risk of developing or exhibiting a multifactorial trait bydetermining a score for the individual based on the individual'sgenotype at a plurality of biallelic polymorphic loci, and comparingthat score to at least one threshold value. In certain embodiments, foreach of the polymorphic loci the genotype of the individual may behomozygous for an associated allele, homozygous for an unassociatedallele, or heterozygous. If the individual's score is greater than athreshold value, then the individual may be considered to be at risk ofdeveloping or exhibiting the multifactorial trait, and if theindividual's score is equal to or less than a threshold value then theindividual may not be considered to be at risk of developing orexhibiting the multifactorial trait. If the individual's score isgreater than one threshold value but less than or equal to anotherthreshold value, then the individual may be considered to have anintermediate risk of developing or exhibiting the multifactorial trait.

The present application further discloses methods for identifyingalleles of biallelic polymorphic loci that are associated with amultifactorial trait, herein referred to as “associated alleles”. Themethods involve performing an association study in which the geneticcomposition of a group of individuals who exhibit the multifactorialtrait (“case group”) is compared to the genetic composition of a groupof individuals who do not exhibit the multifactorial trait (“controlgroup”), and identifying as associated alleles those alleles that aresignificantly more prevalent in the genetic composition of the casegroup than the genetic composition of the control group. In certainembodiments, the associated alleles identified in a first associationstudy with a first case group and a first control group are verified byperforming a second association study with a second case group and asecond control group.

The present application also discloses methods for determining athreshold value for use in a polygenic test. In one aspect, a thresholdis determined by analyzing a series of risk cutoff values that are basedon a set of scores from a case group and a set of scores from a controlgroup. Determination of a threshold value involves using informationincluding but not limited to the sensitivity, specificity, PPV, NVP,accuracy, LR+ and LR− for a polygenic test using each risk cutoff valueas a threshold value; clinical data regarding the multifactorial trait,potential treatment options, and the individual being tested; and inputfrom at least one regulatory agency.

The present invention further discloses a diagnostic or prognostic assaycomprising nucleic acid probes designed to detect the associated allelesin a biological sample. In certain embodiments, the probes of thediagnostic or prognostic assay are bound to a solid substrate.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates an exemplary receiver operating characteristic curvefor establishing a threshold value for a polygenic test.

DETAILED DESCRIPTION

General

Certain embodiments of the present invention provide methods fordetermining with a high degree of certainty the predisposition of anindividual for developing or exhibiting a multifactorial trait, whichmay be, for example, development of a disease or other disorder, or apositive or negative response to a drug. This determination is based onthe genotype of the individual at a plurality of genetic loci, each ofwhich is a genetic factor involved in the manifestation of themultifactorial trait. The methods further provide the benefit of makingsuch a determination without the knowledge of the degree to which, orthe way in which, each genetic factor influences the manifestation ofthe multifactorial trait. The methods of the invention instead rely onthe cumulative effects of multiple genetic factors and enable one ofskill to make an accurate prediction of an individual's likelihood ofdeveloping or exhibiting the multifactorial trait based on the genotypeof the individual at a plurality of genetic loci that have beendetermined to be associated with the incidence of the multifactorialtrait.

Multifactorial traits are influenced by a plurality of genetic factors,environmental factors, and interactions between them. Further, thecontribution of each factor is typically not identical to thecontributions of every other factor. That is, for example, some factorsmay have a strong contribution while others may have a weakcontribution. To complicate the biological basis of multifactorialtraits even more, the contributions of a factor may be additive,synergistic, or completely independent from the contribution of anyother factor. Methods presented herein do not rely on the magnitudes ofthe effect that each factor has on the multifactorial trait, nor do theydepend on whether the effects of the factors are additive, synergisticor independent. In other words, the methods do not require that themagnitude of each factor's effect be taken into consideration whencalculating the an individual's “risk” (e.g. probability, likelihood) ofdeveloping such a multifactorial trait. In addition, the methods don'trequire knowledge of environmental factors that may influence themultifactorial trait. Instead, the methods presented herein rely on aset of assumptions that the individual contribution of each geneticfactor is the same as every other genetic factor's contribution, thatthe individual contributions are simply additive across all geneticfactors underlying the multifactorial trait, and that the risk of anindividual may be assessed in the absence of knowledge of thecontribution of environmental factors to manifestation of themultifactorial trait.

Certain embodiments of the present invention provide methods forperforming an association study to identify a set of polymorphic lociassociated with a multifactorial trait. Also provided are methods fordetermining which of the set of associated polymorphic loci to includein a polygenic test for the multifactorial trait, as well as means todetermine certain characteristics of such a test, e.g., sensitivity,specificity, positive predictive value, negative predictive value,relative risk, likelihood ratio, accuracy, etc. Further provided aremethods for using a set of associated polymorphic loci in a polygenictest to determine the predisposition of an individual for developing orexhibiting that multifactorial trait. In one embodiment, themultifactorial trait is a disease and individuals identified as likelyto develop that disease may be subjected to treatments or other medicalinterventions to treat or prevent development of the disease. In anotherembodiment, the methods of the present invention are used to predict theefficacy of a proposed medical treatment, wherein if the treatment isunlikely to be efficacious then it is not administered to a patient. Inanother embodiment, the multifactorial trait is the exhibition of anadverse event in response to a drug treatment. Individuals identified aslikely to exhibit the adverse event may be excluded from the drugtreatment regimen or if treated with the drug (e.g. as a last resort)additional monitoring may be utilized in anticipation of the adverseevent. In still further embodiments, methods disclosed herein are usedfor drug development, and specifically to increase the efficacy andsafety of drugs by selecting appropriate patients for inclusion instudies.

As will be readily apparent to one of skill in the art, the methods ofthe present invention are to be used as tools to aid in theidentification of individuals who have or are at risk of developing amultifactorial trait of interest, and that the methods presented hereinmay be used in conjunction with clinical information regarding thetrait, individual(s) being tested, and the population from which theindividual(s) is selected, as well as other clinical tests and even theclinical “intuition” of the practitioner. Genetic tests are typicallyused to assist clinicians, not to rule clinical decision-making.Essentially, it is the clinician who must determine how to use adiagnostic or prognostic test using, for example, clinical knowledge ofthe trait (e.g. disease) and the potential treatment options, thecharacteristics of the diagnostic test, the population with which thetest was developed, and the specific patient being tested, whilebalancing the risks to individuals incorrectly identified by the testand the benefits to individuals correctly identified. In another aspect,a clinician may also consider the risks to individuals incorrectlyidentified as “positive” by the test as compared to the risks toindividuals incorrectly identified as “negative” by the test (e.g., doeswithholding treatment to a patient in need of such treatment cause moreharm than administering treatment to a patient who does not need it?)

Reference will now be made in detail to various embodiments andparticular applications of the invention. While the invention will bedescribed in conjunction with the various embodiments and applications,it will be understood that such embodiments and applications are notintended to limit the invention. On the contrary, the invention isintended to cover alternatives, modifications and equivalents that maybe included within the spirit and scope of the invention.

Association Studies

In one aspect of the present invention, a set of polymorphic lociassociated with the manifestation of a multifactorial trait and theassociated alleles that correspond to those polymorphic loci areidentified by carrying out an association study, and the associatedalleles are further used to determine if an individual who is not amember of the case or control groups is genetically predisposed todeveloping or exhibiting the multifactorial trait. A multifactorialtrait may be any type of phenotypic trait, such as exhibition of,susceptibility to, or resistance to a disease or other medical disorder,a response to a drug or other medical treatment regimen, or anotherphysical or mental characteristic. For example, in one embodiment themultifactorial trait is a disease and an association study compares thegenetic composition of a group of individuals who exhibit the disease(cases) with the genetic composition of a group of individuals who donot exhibit the disease (controls). Examples of diseases that aremultifactorial include, but are not limited to asthma and otherpulmonary diseases, psoriasis, dyslexia, infertility, gout, cataracts,obesity, diabetes, gastrointestinal disorders, cancer, cardiovasculardisease, stroke, hypertension, attention deficit disorder,schizophrenia, manic depression, osteoporosis, immune system disorders,multiple sclerosis, atherosclerosis, and epilepsy. Certain developmentalabnormalities are also included in this category, such as cleftlip/palate, congenital heart defects and neural tube defects. In anotherembodiment the multifactorial trait is a response to a drug and anassociation study compares the genetic composition of a group ofindividuals who exhibit a particular response to the drug (cases) withthe genetic composition of a group of individuals who do not exhibit theparticular response (controls). In one aspect, the drug response may berelated to the efficacy of the drug. For example, the drug may be highlyefficacious for individuals in the case group and have poor efficacy forindividuals in the control group, or vice versa. In another aspect, thedrug response may be related to an adverse event in response toadministration of the drug. For example, the individuals in the casegroup may exhibit an adverse event in response to the drug and theindividuals in the control group may not exhibit the adverse event.Although various examples are provided herein that describe uses of themethods of the present invention in combination with specificmultifactorial traits, these examples are not intended to limit thescope of the invention, which encompasses use of the methods presentedherein in conjunction with any multifactorial trait whose manifestationinvolves a plurality of genetic loci.

Typically, at least 50, and preferably at least 100 individuals are inboth of the case and control groups. In some studies, there are at least200, or at least 500 individuals in at least one of the case and controlgroups. Often, there are more individuals in the control group than inthe case group. In certain embodiments, the individuals in the case andcontrol groups are mammals, but the case and control groups may alsocomprise nonmammalian individuals such as, for example, bacteria, fungi,protists, viruses, archaeans, and other eukaryotes such as reptiles,amphibians, fish, birds, crustaceans, insects, and plants. In someembodiments the individuals in the case and control groups are humans.

Typically, the composition of the case and control groups should besimilar with regards to characteristics aside from the multifactorialtrait under consideration. For example, in one embodiment, similarnumbers of men and women of similar ages will be selected for eachgroup. In certain embodiments, an environmental risk factor mayinfluence the composition of the case and control groups. For example,only smokers (or only nonsmokers) may be selected to comprise the caseand control groups for a study on lung cancer. In some embodiments ofthe present invention, membership of the case and control groups isadjusted so that the population structures of the two groups are“matched” prior to performing an association study. Population structure(or “population stratification”) refers to the heterogeneity of thegenetic composition of individuals within a population. For example, thepopulation structure of a case group that is composed mainly of Italiansis different than a control group that is composed mainly of Mexicansdue to the different ethnic origins of the two groups. If an associationstudy was performed without matching the groups, then genetic loci thatare associated with an Italian ancestry, but not with Mexican ancestrymay erroneously appear to be associated with the multifactorial traitunder study. By matching the population structure of the case andcontrol groups, one of skill can control for the genetic differencesbetween the case and control groups that are not related to themultifactorial trait of interest. Therefore, the genetic differencesbetween the groups that are identified by the subsequent associationstudy are more likely to be loci that are causally-related to themultifactorial trait of interest. Methods for matching case and controlgroups prior to performing an association study are described in detailin U.S. utility patent application Ser. No. 10/427,696, filed Apr. 30,2003, entitled “Method for Identifying Matched Groups”; and U.S.provisional patent application No. 60/497,771, filed Aug. 26, 2003,entitled “Matching Strategies for Genetic Association Studies inStructured Populations”.

Nucleic acid samples are collected from the individuals in the case andcontrol groups for use in genotyping assays. The nucleic acid samplesmay be DNA or RNA and may be obtained from various biological samplessuch as, for example, whole blood, semen, saliva, tears, fecal matter,urine, sweat, buccal, skin and hair. In certain aspects, the nucleicacid samples comprise genomic DNA. Sample nucleic acids may be preparedfor analysis using any technique known to those skilled in the art.Preferably, such techniques result in the production of a nucleic acidmolecule sufficiently pure to determine the presence or absence of oneor more polymorphisms at one or more locations in the nucleic acidmolecules. Such techniques are commonly known and may be found, forexample, in Sambrook, et al., Molecular Cloning: A Laboratory Manual(Cold Spring Harbor Laboratory, New York) (2001), and Ausubel, et al.,Current Protocols in Molecular Biology (John Wiley and Sons, New York).

One or more nucleic acids of interest may be amplified and/or labeledbefore determining the presence or absence of one or more polymorphismsin the nucleic acid. Any amplification technique known to those of skillin the art may be used in conjunction with certain methods of thepresent invention including, but not limited to, polymerase chainreaction (PCR) techniques. PCR may be carried out using materials andmethods known to those of skill in the art. See generally PCRTechnology: Principals and Applications for DNA Amplification (ed. H. A.Erlich, Freeman Press, NY, N.Y., 1992); PCR Protocols: A Guide toMethods and Applications (eds. Innis, et al., Academic Press, San Diego,Calif., 1990); Matilla et al., Nucleic Acids Res. 19: 4967 (1991);Eckert et al., PCR Methods and Applications 1: 17 (1991); PCR (eds.McPherson et al., IRL Press, Oxford); and U.S. Pat. No. 4,683,202. Othersuitable amplification methods include the ligase chain reaction (LCR)(see Wu and Wallace, Genomics 4: 560 (1989) and Landegren et al.,Science 241: 1077 (1988)), transcription amplification (Kwoh et al.,Proc. Natl. Acad. Sci. USA 86: 1173 (1989)), self-sustained sequencereplication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87: 1874(1990)) and nucleic acid-based sequence amplification (NASBA). Further,the methods disclosed in pending U.S. patent application Ser. Nos.10/106,097, filed Mar. 26, 2002, entitled “Methods for GenomicAnalysis”; 10/042,406, filed Jan. 9, 2002, entitled “Algorithms forSelection of Primer Pairs”; 10/042,492, filed Jan. 9, 2002, entitled“Methods for Amplification of Nucleic Acids”; 10/236,480, filed Sep. 5,2002, entitled “Methods for Amplification of Nucleic Acids”; 10/174,101,filed Jun. 17, 2002, entitled “Methods for Storage of ReactionCocktails”; 10/447,685, filed May 28, 2003, entitled “Liver RelatedDisease Compositions and Methods”, 10/768,788, filed Mar. 4, 2004,entitled “Apparatus and Methods for Analyzing and Characterizing NucleicAcid Sequences”; and 10/427,696, filed Apr. 30, 2003, entitled “Methodfor Identifying Matched Groups” are suitable for amplifying, labeling,or further manipulating (i.e. fragmentation) nucleic acids for use incertain methods of the present invention.

In an association study, genetic loci that are known to be polymorphic(e.g. SNPs) are genotyped for each individual in each of the case andcontrol groups and a relative allele frequency is calculated for each ofthe loci for each of the groups based on the genotypes present in thegroups. That is, if ten polymorphic loci are genotyped, then twentyrelative allele frequencies are determined, ten for each of the case andcontrol groups. For a given polymorphic locus, the relative allelefrequency for the case group is compared to that for the control group,and if the polymorphic locus has a significantly different relativeallele frequency in the case group than in the control group it isidentified as a locus that may be associated with the multifactorialtrait that distinguishes the case and control groups (“associatedlocus”). In certain embodiments, a significant difference in relativeallele frequency is a difference of greater than about 5%, or greaterthan about 8%, or greater than about 10%, or greater than about 12%, orgreater than about 15%. The allele that is present more often in thecase population may be referred to as the “associated allele”, and theallele that is present more often in the control population may betermed the “unassociated allele”. The number of associated loci (and,hence, associated alleles for biallelic associated loci) identified willvary widely depending on how many polymorphic loci contribute to themultifactorial trait (e.g. disease) under study or are in linkagedisequilibrium with loci that contribute. For example, if themanifestation of a disease involves ten genes, then the number ofassociated loci identified will be dependent on how many of thepolymorphic loci that are genotyped in the association study are inlinkage disequilibrium with the alleles of the ten genes that cause thedisease. Typically, the number of loci involved in the manifestation ofa multifactorial trait ranges between about five to several hundred, butit may be higher or lower. For a detailed description of methods forperforming an association study using relative allele frequencies of acase and a control group, see U.S. patent application Nos. 60/460,329,filed Apr. 3, 2003, and Ser. No. 10/768,788, filed Jan. 30, 2004, bothof which are entitled “Apparatus and Methods for Analyzing andCharacterizing Nucleic Acid Sequences”.

Genotyping of the individuals may be performed using any technique knownto those of skill in the art. Preferred techniques permit rapid,accurate determination of multiple variations with a minimum of samplehandling. Some examples of suitable techniques involve but are notlimited to direct DNA sequencing, capillary electrophoresis,hybridization, allele-specific probes or primers, single-strandconformation polymorphism analysis, nucleic acid arrays, bead arrays,restriction fragment length polymorphism analysis, cleavase fragmentlength polymorphism analysis, random amplified polymorphic DNA, ligasedetection reaction, heteroduplex or fragment analysis, differentialsequencing with mass spectrometry, atomic force microscopy,pyrosequencing, FRET (e.g., TaqMan (Applied Biosystems, Inc., FosterCity, Calif.) and Molecular Beacon (Stratagene, La Jolla, Calif.)assays), and other techniques well known in the art. Several methods forDNA sequencing are well known and generally available in the art. See,for example, Sambrook, et al., Molecular Cloning: A Laboratory Manual(Cold Spring Harbor Laboratory, New York) (2001); Ausubel, et al.,Current Protocols in Molecular Biology (John Wiley and Sons, New York)(1997), Twyman, et al. (2003) “Techniques Patents for SNP Genotyping”,Pharmacogenomics 4(1):67-79; and Kristensen, et al. (2001)“High-Throughput Methods for Detection of Genetic Variation”,BioTechniques 30(2):318-332. For details on the use of nucleic acidarrays (DNA chips) for the detection of, for example, SNPs, see U.S.Pat. No. 6,300,063 issued to Lipshultz, et al., and U.S. Pat. No.5,837,832 to Chee, et al., HuSNP Mapping Assay, reagent kit and usermanual, Affymetrix Part No. 90094 (Affymetrix, Santa Clara, Calif.).

The relative allele frequency for a case or control group may bedetermined directly, by individually genotyping all the individuals inthe population to determine the exact amount of each allele in eachindividual in the population. Methods for individually genotyping aplurality of individuals are described in detail in U.S. patentapplication Ser. No. 10/351,973, filed Jan. 27, 2003, entitled“Apparatus and Methods for Determining Individual Genotypes” and U.S.patent application no. (unassigned), attorney docket no. 100/1046-20,filed Feb. 24, 2004, entitled “Improvements to Analysis Methods forIndividual Genotyping”. Alternatively, pooled genotyping may be used todetermine a relative allele frequency for each of the case and controlgroups. For pooled genotyping, nucleic acid samples from the case groupare pooled together (case pool) and nucleic acid samples from thecontrol group are pooled together (control pool), and the relativeallele frequencies for the case group and the control group aredetermined through analysis of the case and control pools. Methods forpooled genotyping are discussed in detail in U.S. patent applicationNos. 60/460,329, filed Apr. 3, 2003, and Ser. No. 10/768,788, filed Jan.30, 2004, both of which are entitled “Apparatus and Methods forAnalyzing and Characterizing Nucleic Acid Sequences”.

Genetic Loci

The term “SNP” or “single nucleotide polymorphism” refers to a geneticvariation between individuals; e.g., a single nitrogenous base positionin the DNA of organisms that is variable. SNPs are found across thegenome; much of the genetic variation between individuals is due tovariation at SNP loci, and often this genetic variation results inphenotypic variation between individuals. SNPs for use in the presentinvention and their respective alleles may be derived from any number ofsources, such as public databases (U.C. Santa Cruz Human Genome BrowserGateway (http://genome.ucsc.edu/cgi-bin/hgGateway) or the NCBI dbSNPwebsite (http://www.ncbi.nlm.nih.gov/SNP/), or may be experimentallydetermined as described in U.S. Pat. Nos. 6,969,589; and 10/284,444,filed Oct. 31, 2002, entitled “Human Genomic Polymorphisms”. Althoughthe use of SNPs is described in some of the embodiments presentedherein, it will be understood that other biallelic genetic markers mayalso be used. A biallelic genetic marker is one that has two polymorphicforms, or alleles. As mentioned above, for a biallelic genetic markerthat is associated with a trait, the allele that is more abundant in thegenetic composition of a case group as compared to a control group istermed the “associated allele”, and the other allele may be referred toas “the unassociated allele”. Thus, for each biallelic polymorphism thatis associated with a given trait (e.g., a disease or drug response),there is a corresponding associated allele. Other biallelicpolymorphisms that may be used with the methods presented hereininclude, but are not limited to multinucleotide changes, insertions,deletions, and translocations. It will be further appreciated thatreferences to DNA herein may include derivatives of DNA such asamplicons, RNA transcripts, cDNA, DNA analogs, etc. The polymorphic locithat are screened in an association study may be in a diploid or ahaploid state and, ideally, would be from sites across the genome.

In some embodiments of the present invention, an association studyinvolves screening at least about 100 SNPs, or at least about 500 SNPs,or at least about 1000 SNPs, or at least about 10,000 SNPs, or at leastabout 100,000 SNPs, or at least about 1,000,000 SNPs. In certainembodiments, SNPs that are located in one or more parts of the genomebelieved to be associated with the multifactorial trait are screened. Inother embodiments, SNPs on one or more chromosomes are screened. Instill further embodiments, SNPs from every chromosome in a genome arescreened. In other embodiments, multiple SNPs from every chromosome in agenome are screened. In other embodiments, SNPs that are located in thecoding region or the regulatory region of a gene are screened. Infurther embodiments, SNPs that have been found to be associated with thedifferential allelic expression of a gene are screened. (Differentialallelic expression occurs when one allele of a gene is expressed at ahigher level than another allele of the same gene in a heterozygote, andis described in detail in U.S. patent application Ser. No. 10/438,184,filed May 13, 2003, entitled “Allele-specific Expression Patterns”.) Incertain embodiments, all known SNPs (approximately 3 million to date)are screened. In other embodiments, a subset of SNPs is screened thatmay be used to predict the allelic composition of a subset of SNPs thatis not screened. SNPs screened by the methods presented herein may be ineither a diploid or a haploid state in an individual.

The number of associated SNPs (and therefore associated alleles)identified by the methods presented herein is dependent on severalcriteria. First, it is dependent on the number of genetic loci that areinvolved in the manifestation of the disease. For example, if thegenetic basis for a multifactorial disease involves only a few loci,then the number of associated SNPs and associated alleles will typicallybe less than that found for a multifactorial disease whose genetic basisinvolves hundreds of loci. Further, the number of associated SNPs andassociated alleles identified is dependent on how many SNPs are screenedin the association study. For example, an association study that screensonly one hundred SNPs in the case and control groups will be less likelyto find a large number of associated SNPs than one that screens onemillion SNPs. Typically, the methods presented herein will identifybetween about ten and several hundred associated SNPs/associatedalleles, but may identify more or fewer.

Validation of the Set of Associated Alleles

In one embodiment, to validate the identification of the associatedalleles, the association study is repeated using a second case andsecond control population. This second association study determineswhether those associated alleles from the first association study arestill identified as associated alleles based on the relative allelefrequencies of a new set of cases and controls, and those that do“replicate” have thereby been validated as associated SNPs. In certainembodiments, the polymorphic loci that were screened in the firstassociation study are also screened in the second validating associationstudy. In other embodiments, a subset of the polymorphic loci that werescreened in the first association study are screened in the secondvalidating association study. In a specific embodiment, the set ofpolymorphic loci screened in the second association study comprises theassociated polymorphic loci that were identified by the firstassociation study. For example, if 30,000 SNPs are identified asassociated with the incidence of a disease in a first association study,then those 30,000 SNPs are subsequently screened in a second associationstudy for which a second case group of individuals exhibiting thedisease and a second control group not exhibiting the disease areselected. In certain embodiments, the second case group is selectedaccording to the same criteria as the first case group, and the secondcontrol group is selected according to the same criteria as the firstcontrol group. In one aspect, the first and second case group and thefirst and second control group have no members in common. The secondassociation study may be performed using a pooled or individualgenotyping methodology.

In other aspects, if an association study is performed using pooledgenotyping, the set of associated alleles determined by the pooledgenotyping methodology may be validated by individually genotyping theset of associated SNPs in every individual in the case and controlgroups and recalculating and recomparing the relative allelefrequencies. The associated alleles that were identified based on theinitial pooled genotyping analysis that have a significantly higherallele frequency in the case group as compared to the control groupbased on the individual genotyping data are thereby verified asassociated alleles. This validation step may be performed for a firstassociation study that utilizes a pooled genotyping methodology, or maybe performed for a second validating association study that uses apooled genotyping methodology.

More than one validation method may be used in a study design toidentify a set of associated SNPs. For example, in one embodiment of thepresent invention, an initial association study is performed with a casepopulation of individuals that exhibit a disease and a controlpopulation of individuals that do not exhibit the disease. A pooledgenotyping methodology is used to genotype the case and control groupsat approximately 1.5 million SNP loci to identify about 30,000 SNPs withrelative allele frequencies that differ significantly between the caseand control groups. In a next validating step, the case and controlgroups are individually genotyped at each of the about 30,000 SNPsidentified in the “pooled” association study to identify approximately300 SNPs that have significantly different relative allele frequenciesin the case group than in the control group based on the individualgenotyping methodology. Thus, these approximately 300 SNPs have beenvalidated by individual genotyping. In a further validating step, asecond association study is performed in which the approximately 300SNPs validated by the individual genotyping step are further validatedby performing a second association study based on an individualgenotyping methodology with a second case group and a second controlgroup. Those SNPs that replicate in the second association study areclassified as associated SNPs for the disease, and the alleles of theassociated SNPs that are more abundant in the case groups than in thecontrol groups are termed the associated alleles.

Use of Associated Alleles for Determining Risk Cutoffs

In one embodiment of the present invention, the genotypes of theindividuals in the case and control groups at each of thedisease-associated SNP loci are used to develop a series of cutoffvalues to be used in determining the predisposition of an individual fordeveloping the multifactorial trait that distinguishes the case groupfrom the control group.

In one aspect, the genotypes at each associated SNP location arecollected for all the individuals in the case and control groups. Ifindividual genotyping was performed during the association study, asdiscussed supra, then the genotyping data collected for the associatedSNP positions during the association study may be used. However, ifindividual genotypes have not been determined, then each member of thecase and control group must be individually genotyped for the set ofassociated SNPs. For example, in the case of a biallelic SNP, a diploidindividual may have one of three different genotypes, homozygous for theassociated allele, homozygous for the unassociated allele, andheterozygous (having one associated allele and one unassociated allele).The methods presented herein may also be applied to haploid organisms,or to haploid loci in diploid organisms (e.g., Y chromosome loci inhumans). For a haploid locus, there would be only two genotypes, one foreach possible allele.

In another aspect, each individual in the case and control groups isassigned a score based on their genotype at each of the associated SNPloci. Each associated allele is valued at one point so each SNP genotypethat is homozygous for the associated allele is worth two points, eachSNP genotype that is heterozygous is worth one point, and each SNPgenotype that is homozygous for the unassociated allele is worth zeropoints. In one embodiment for a haploid locus, each SNP genotype withthe associated allele is worth one point and each SNP genotype with theunassociated allele is worth zero points. In another embodiment for ahaploid locus, each SNP genotype with the associated allele is worth twopoints and each SNP genotype with the unassociated allele is worth zeropoints. For a given individual, all the points across all the associatedSNPs are summed to provide a score for that individual. For example, if100 associated SNPs are being genotyped, then the maximum score for anindividual is 200, meaning that the individual has two associatedalleles at every associated SNP position. In other words, the individualis homozygous for the associated allele at every SNP location. Theminimum score is 0, for an individual that has no associated alleles atany associated SNP positions, or is homozygous for the unassociatedallele at every SNP location. Scores are calculated for every individualin the case and control groups. For example, 100 associated SNPs areexamined for a case population of 102 individuals and a controlpopulation of 405 individuals. The lowest score in the case group is 42and the highest score is 97; for the control group, the lowest score is23 and the highest score is 79.

In another aspect, a series of risk cutoff values is determined. Riskcutoff values represent hypothetical threshold values for use in agenetic test to identify individuals likely to develop or exhibit amultifactorial trait. For example, individuals who have a score higherthan a threshold value may be diagnosed as being likely to exhibit themultifactorial trait, and those who have a score at or lower than thethreshold may be diagnosed as being unlikely to exhibit themultifactorial trait. Alternatively, multiple thresholds may be used todetermine an individual's risk of exhibiting the multifactorial trait.

The series of risk cutoff values spans a range from 1 to the highestscore calculated for an individual in the association study, regardlessof whether they were a member of the case or control group. In oneexample, the highest score for an individual is 97 points, so the rangefrom which the risk cutoff values are determined (the risk cutoff range)is between 1 and 97. In certain aspects, risk cutoff values are selectedfrom across the risk cutoff range, although the selection of particularrisk cutoff values is somewhat arbitrary. In some embodiments, everyscore within the risk cutoff range is chosen. In other embodiments,every n^(th) score (every 5^(th) or 10^(th), for example) is chosen. Instill further embodiments, the range is divided into percentages andevery n^(th) percentage is chosen. In some embodiments, more risk cutoffvalues are selected from the middle portion of the complete range ofscores than at the top or bottom portions of the range, or vice versa.For example, in the case in which the complete range of scores isbetween 1 and 97, risk cutoff values are chosen at every 10^(th) scorebetween 20 and 80, and additional risk cutoff values of 55 and 65 areadded to better assess the middle of this range (see Table 1).

In a subsequent step, each of the risk cutoff values is compared to thescores calculated for the individuals in the case and control groups.Specifically, the scores for the case (“affected”) and control(“unaffected”) individuals are used to determine which of the riskcutoff values provides the best sensitivity, specificity, positivepredictive value (PPV), negative predictive value (NPV), accuracy or acombination thereof for distinguishing individuals likely to exhibit themultifactorial trait from those not likely to exhibit the multifactorialtrait, thereby identifying a risk cutoff value that would be a goodthreshold value for a polygenic test using the associated SNPs. Inaddition, identification of an appropriate threshold value may furtherinvolve use of clinical information (e.g. regarding the multifactorialtrait, population under study, or individuals being tested) and/orinteraction of the practitioner of the present invention with an outsideagency (e.g. U.S. Food and Drug Administration (FDA)). This thresholdvalue may be developed into a genetic test, e.g. a diagnostic, with thesensitivity, specificity, PPV, NPV and accuracy calculated based on thethreshold value and the scores for the case and control groupindividuals.

A two-class genetic test has two possible results. A positive testresult indicates that an individual exhibits or is likely to exhibit atrait of interest, and a negative test result indicates that anindividual does not exhibit and is not likely to exhibit the trait ofinterest. As such, the reliability of a genetic test is related to howoften the result of the test correctly identifies an individual as“positive” or “negative” for the trait. True positives (TP) and truenegatives (TN) are test results that accurately identify individuals aspositive (e.g. “affected”) and negative (e.g. “unaffected”),respectfully. A false positive (FP) is a test result that incorrectlyclassifies an individual as a positive when they are in fact negativefor the trait. Likewise, a false negative (FN) is a test result thatincorrectly classifies an individual as a negative when they are in factpositive for the trait. Measures of TP, TN, FP and FN are used tocalculate the sensitivity, specificity, PPV and NPV for a genetic test.

The “sensitivity” of a test is a measure of the ability of the test tocorrectly identify an affected individual, or an individual who willdevelop the trait of interest. The closer the sensitivity is to one, themore accurate the test is in identifying affected individuals.Specifically, the sensitivity refers to the proportion of affectedindividuals who are correctly diagnosed as such by the test, and iscalculated as the number of individuals correctly identified as affected(TP) divided by the total number of affected individuals (TP+FN). A highsensitivity is preferred so that most affected individuals areidentified as such by the genetic test. The “specificity” of a test is ameasure of the ability of the test to correctly identify an unaffectedindividual, or an individual who will not develop the trait of interest.The closer the specificity is to one, the more accurate the test is inidentifying unaffected individuals. Specifically, the specificity refersto the proportion of unaffected individuals who are correctly identifiedas such by the test, and is calculated as the number of individualscorrectly identified as unaffected (TN) divided by the total number ofunaffected individuals (TN+FP). A high specificity is preferred so thatthe number of individuals who are incorrectly identified as affected isminimized. Thus, for a given risk cutoff value, the sensitivity iscalculated as the proportion of case individuals with a score higherthan the risk cutoff value, and the specificity is calculated as theproportion of control individuals with a score lower than or equal tothe risk cutoff value (or, one minus the proportion of controlindividuals with a score higher than the risk cutoff value).

The “positive predictive value” (PPV) of a genetic test assesses thereliability of a positive test outcome/result, and is computed as theproportion of people with a positive test result who actually have thetrait of interest. In other words, it is the probability that a positivetest result accurately identifies an individual who has the trait, andis calculated as the number of individuals correctly identified asaffected (TP) divided by the total number of individuals identified asaffected by the genetic test (TP+FP). In many cases, a high PPV ispreferred so that most individuals who are identified as affected areactually affected. For example, a PPV of 0.98 means that an individualwith a positive test result has a 98% chance of having or developing thetrait. The “negative predictive value” (NPV) of a genetic test assessesthe reliability of a negative test outcome/result, and is computed asthe proportion of people with a negative test result who do not have thetrait of interest. Put another way, it is the probability that anegative test result accurately identifies an individual who does nothave the trait, and is calculated as the number of individuals correctlyidentified as unaffected (TN) divided by the total number of individualsidentified as unaffected (TN+FN). A high NPV is sometimes preferred sothat most individuals who are identified as unaffected are actuallyunaffected (e.g., in excluding subjects at risk for adverse eventsassociated with the administration of a specific drug). For example, anNPV of 0.999 means that an individual with a negative test result hasonly a 0.1% chance of having or developing the trait (e.g., ofexperiencing the adverse event in response to the drug). Thus, for agiven risk cutoff value, the PPV may be calculated as the proportion ofall individuals with a score higher than the risk cutoff value that areactually in the case group, and the NPV is calculated as the proportionof all individuals with a score lower than or equal to the risk cutoffvalue that are actually in the control group.

The prevalence of a trait is the frequency of the trait among thepopulation being tested, and is calculated as the number of existingcases divided by the total population at a given point in time. Althoughthe sensitivity and specificity of a test are not influenced by theprevalence of the trait under consideration, both PPV and NPV are highlyinfluenced by the prevalence of the trait in the population beingtested; a lower disease prevalence results in a lower PPV and a higherNPV. Both PPV and NPV may also be calculated as a function of thesensitivity (sens), specificity (spec) and prevalence (prev):PPV=(sens)(prev)/[(sens)(prev)+(1−spec)(1−prev)]NPV=(spec)(1−prev)/[(spec)(1−prev)+(1−sens)(prev)].

Threshold values may also be selected using likelihood ratios for thegenetic test. A likelihood ratio (LR) is a way to incorporate thesensitivity and specificity of a test into one measure, and gives anindication of how much the odds of having or developing a given traitchange based on a positive or negative test result. Since sensitivityand specificity are fixed characteristics of the test itself, the LR isindependent of the prevalence of the trait in the population, unlike PPVand NPV. An LR is the likelihood that a given test result would beexpected in an individual with the trait compared to the likelihood thatthe same result would be expected in an individual without the trait. AnLR for a positive test result (LR+) provides a measure of how much theodds of an individual having or developing the trait increase when thetest is positive, and is calculated as the sensitivity divided by(1−specificity). The better test to use for “ruling in” a trait is theone with the largest LR+. An LR for a negative test result (LR−)provides a measure of how much the odds of an individual having ordeveloping the trait decrease when the test is negative, and iscalculated as (1−sensitivity) divided by the specificity. The bettertest to use to “rule out” a trait is the one with the smaller LR−. LRsof greater than 10 or less than 0.1 are usually judged to be of highdiagnostic value. The LRs are combined with the “pre-test odds” todetermine the “post-test odds” that the individual tested has or willdevelop the trait of interest (post-test odds=pre-test odds×LR). Thepre-test odds are computed with information about the prevalence of thetrait, the characteristics of the population and information about theparticular individual being tested, and represent the likelihood thatthe individual will have or develop the trait prior to testing. Thepost-test odds represent the likelihood that the individual will have ordevelop the trait given the testing results. In one embodiment of thepresent invention, a threshold value is selected that maximizes the LRfor a genetic test.

Yet another measure of the value or utility of a genetic test is theaccuracy, which measures the overall agreement between the test resultsand the actual disease state. Accuracy is calculated as the sum of thetrue positives and true negatives divided by the total number of sampleresults ((TP+TN)/(TP+TN+FP+FN)). The accuracy of a genetic test may beused to determine which of a set of risk cutoff values may be a usefulthreshold value in a polygenic test.

Sensitivity, specificity, PPV, NPV and accuracy are calculated for eachrisk cutoff value, and Table 1 below lists these values for an examplein which 102 cases and 405 controls are analyzed. The cutoff valueschosen from the complete range of scores are shown in the first column.The number of case individuals with a score higher than thecorresponding cutoff value is shown in the second column. The thirdcolumn lists the number of control individuals with a score higher thanthe corresponding cutoff value. The sensitivity for a test using of eachof the corresponding cutoff values as threshold values is shown in thefourth column. The specificity for a test using of each of thecorresponding cutoff values as threshold values is shown in the fifthcolumn. The PPV and NPV of a test using each of the corresponding riskcutoff values as threshold values are shown in the sixth and seventhcolumns, respectively. Finally, the accuracy of a test using each of thecorresponding cutoff values as threshold values is shown in the eighthcolumn. TABLE 1 Risk Cutoff # Cases # Controls Values (out of 102) (Outof 405) Sensitivity Specificity PPV NPV Accuracy 80 25 0 0.25 1 1 0.840.85 70 51 2 0.50 0.995 0.96 0.89 0.90 65 65 8 0.64 0.98 0.89 0.91 0.9160 79 34 0.77 0.92 0.70 0.94 0.89 55 93 81 0.91 0.80 0.53 0.97 0.82 5099 154 0.97 0.62 0.39 0.99 0.69 40 102 318 1 0.21 0.24 1 0.37 30 102 3941 0.03 0.21 1 0.22 20 102 405 1 0 0.20 1 0.20

Under optimum conditions a genetic test is both highly sensitive andhighly specific with a high PPV, NPV and accuracy so that allindividuals tested are correctly identified as having or not having thetrait of interest. However, in typical circumstances the selection of anoptimal risk cutoff value may be based, e.g., on the best combination ofspecificity, sensitivity, PPV, NPV and accuracy, or a subset thereof. Asshown in Table 1, using a high risk cutoff value increases thespecificity and PPV of the test while lowering the sensitivity and NPV.Therefore, if a genetic test to determine the predisposition of anindividual for developing a disease is based on a high risk cutoffvalue, very few individuals would be misdiagnosed as having a high riskof developing the disease, but a large proportion of those that do havea high risk would not be identified. On the other hand, using a low riskcutoff value increases the sensitivity and NPV while lowering thespecificity and PPV whereby although most or all individuals at highrisk would be identified as such, a significant number of individuals atlow risk would also be erroneously identified as being at high risk.Therefore, it is apparent that neither of these extremes is useful, butinstead a balance of sensitivity, specificity, PPV and NPV may bedetermined for the particular trait, population and individual underconsideration.

Determination of a threshold value is dependent on many factors. Forexample, clinical knowledge of the disease is typically required to makethis determination. Further, a threshold value for a polygenic test maybe regulated by a regulatory agency (e.g. FDA) or varied by a cliniciandepending on, for example, information regarding the potentialtreatments, characteristics of the polygenic test, or the specifics fora particular patient. Further, a threshold value may or may not be usedin a dichotomous fashion. For example, an individual's treatment mayvary depending on whether the individual's score is greater than thethreshold value (e.g. administer a drug) or less than or equal to thethreshold value (e.g. don't give the drug). Alternatively, individualswith scores close to the threshold may be treated differently than thosewith scores far from the threshold. For example, a decision may be madeby a clinician to administer the drug to an individual with a score thatis slightly below the threshold based on additional factors, such asclinical knowledge and input from the individual. Further, the use of“greater than” versus “less than or equal to” with regards to comparinga score to a threshold value is merely a matter of convention, and inalternative embodiments of the present invention the use of “greaterthan or equal to” versus “less than” may be used instead, as will beclear to one of ordinary skill in the art.

In one aspect, determining a threshold value is dependent on theseverity of the disease. For example, if the trait relates to thedevelopment of a severe disease, then one would prefer to have a veryhigh sensitivity despite a lower specificity since identifying those athigh risk is critical for those individuals. For example, treatablemalignancies (in situ cancers or Hodgkin's disease) should be foundearly, so sensitive tests should be used in the diagnostic work-up.Similarly, a test with a high NPV is preferred for a severe disease toensure that the number of false negatives is low. Since the number offalse positives may be significant due to a less than ideal PPV,additional testing may be performed to confirm the status of thoseindividual who tested positive/affected, using, e.g., a highly accurate“gold standard” test. As such, it may be more acceptable to have a lowerPPV when there are other confirmatory diagnostics readily available. Forexample, the rate of atypical cervical cells in the general populationis approximately 1/1000 and the sensitivity and specificity of a paptest are 0.70 and 0.90, respectively. Based on these values, the PPV andNPV for the pap test are 0.00696 and 0.999, respectively, meaning that aperson with a positive pap test has only a very small likelihood oftruly having atypia, while a person with a negative pap test almostcertainly is disease-free.

In certain aspects, a high specificity and PPV is preferred for agenetic test, e.g. when there are highly undesirable repercussions forfalse positive test results. For example, if the test is being used tomake a decision on whether an individual will receive a dangeroustreatment regimen (transplant surgery, chemotherapy, radiation, drugwith serious adverse events, mastectomy, etc.), then it is importantthat individuals who are identified by the test as needing the treatmentactually do need the treatment. For example, a genetic test may bedeveloped for identification of individuals who are at high risk ofdeath in the absence of a heart transplant procedure. Thus, individualswho have a score higher than a threshold value are identified as likelyto die unless they receive a new heart. Such a test would be preferredto have a very high PPV (˜1.0) so that only individuals with a highprobability of death are considered for a heart transplant. Althoughthis would mean that a significant number of individuals that will diewithout a heart transplant will be excluded from the treatment (lowerNPV), optimally no individuals will be given a heart transplant who donot absolutely need one.

Another factor in determining an appropriate threshold value for agenetic test is the prevalence of the disease in the population as awhole. For example, take a trait that is extremely rare in thepopulation. A specificity of 0.95 may seem acceptably high, but it meansthat five percent of individuals who do not have a high risk will bemisdiagnosed as having a high risk of developing the trait. Thus, for atrait that has a frequency in the population of 1/10,000, approximately500 individuals would be misdiagnosed as “high risk” (false positives)for every individual that is correctly identified as being at risk ofdeveloping the trait. Accordingly, it is best suited to use a cutoffwith a higher specificity for rare, non-severe traits and a cutoff witha higher sensitivity for common, severe traits. Further, as describedabove, PPV and NPV are highly dependent on the prevalence of the traitof interest. For example, the PPV of a genetic test used to identifyindividuals at risk of developing a disease from a population that has alow prevalence of the disease will be lower than the PPV of the samegenetic test used to identify individuals at risk of developing thedisease from a population that has a high prevalence of the disease.Similarly, the NPV of a genetic test used to identify individuals atrisk of developing a disease from a population that has a low prevalenceof the disease will be higher than the NPV of the same genetic test usedto identify individuals at risk of developing the disease from apopulation that has a high prevalence of the disease. As such, althougha genetic test may have a very high PPV (or a very high NPV) when beingused to test individuals in one population, it may not be useful inother populations where the prevalence of the trait of interest isdifferent, and therefore a different threshold value may be chosen fordifferent populations depending on the prevalence of the trait ofinterest. In short, one skilled in the art can select threshold valuesto achieve one or more clinically useful parameters, such assensitivity, specificity, PPV, NPV, accuracy, and the like for a patientpopulation having a particular prevalence for a given trait using notonly the methods presented herein, but also clinical knowledge andintuition, as well as, e.g., interactions with regulatory agencies suchas the FDA.

In one aspect of the present invention, a threshold value for apolygenic test using the associated SNPs is determined using a ROC(receiver operating characteristic) curve (Hanley et al. (1982)Radiology 143:29-36; and Beck, et al. (1986) Arch. Pathol. Lab. Med.110:13-20) based on the sensitivities and specificities calculated forthe risk cutoff values. A ROC curve is related to the inherent tradeoffbetween the sensitivity and specificity of a genetic test, and isgenerated by plotting the sensitivity as a function of one minus thespecificity for each risk cutoff value, as shown in FIG. 1, whichillustrates a ROC curve generated using the data from Table 1. Thus,each risk cutoff value corresponds to a “data point” on the ROC curve.The area under the curve provides a measure of the reliability of thegenetic test. For a genetic test that can perfectly distinguish betweenaffected and unaffected individuals (sensitivity and specificity areeach 1), the area under the curve is 1. For a genetic test that fails todistinguish between affected and unaffected individuals, the area underthe curve is 0.5. In general, the closer the curve follows the left-handand top borders of the plot, the more accurate the genetic test, and thecloser the curve comes to the 45 degree angle of the ROC space, the lessaccurate the test. Computer programs commonly used for analyzing ROCcurves are publicly available and include ROCKIT, CORROC2, LABROC4,ROCFIT, CLABROC, ROCPWR, LABMRMC, and PROPROC, all of which may bedownloaded from Kurt Rossman Laboratories for Radiological ImageResearch at the following website:www-radiology.uchicago.edu/krl/KRL_ROC/software/_index.htm#ROC%20calculations%20A uxiliary%20software. In certain embodiments, athreshold value is chosen from the risk cutoff values whose data pointsare found in a portion (e.g. percentage) of the ROC curve that isnearest the upper left corner of the plot. For example, if data pointswere chosen from the 20% of the ROC curve nearest the upper left cornerof the plot shown in FIG. 1 (between arrows A and B), then a thresholdvalue would be selected from the data points corresponding to riskcutoff values of 55 and 60, indicated as D and E, respectively. In otherembodiments, a threshold value is determined to be the risk cutoff valuewhose sensitivity and specificity is represented by the data pointnearest the upper left corner of the plot. In FIG. 1, this data point(D) corresponds to a risk cutoff value of 55. In still furtherembodiments, a threshold value is determined from the location on theROC curve that is closest to the upper left corner of the plot. In FIG.1, this location is indicated as C, and corresponds to a sensitivity ofabout 0.87 and a specificity of about 0.84. In this embodiment, a riskcutoff value is determined that corresponds to the sensitivity andspecificity represented by this location on the curve, and that riskcutoff value is used as the threshold value for a genetic test using theassociated SNPs. For example, since the location C is between the datapoints D and E, the optimal risk cutoff value to use as a thresholdvalue must be between 55 and 60. To determine the optimal risk cutoffvalue, the sensitivity and specificity are determined for all riskcutoff values in that range based on the scores of the case and controlgroups (see Table 2). The risk cutoff value whose sensitivity andspecificity are closest to 0.87 and 0.84, respectively, is chosen, andin this example that risk cutoff value is 56, with a sensitivity of 0.88and a specificity of 0.84. Therefore, 56 is chosen as the thresholdvalue for a polygenic test using the associated alleles.

In another embodiment of the present invention, a threshold value may bechosen based on a specific desired clinical result. For example, agenetic test may be developed to stratify patient population as a meansfor reducing the incidence of adverse events in individuals given aparticular therapeutic. For example, a drug may be approved for limiteduse due to a 4% incidence of adverse events, but could be approved forwider use if the incidence of adverse events was lowered by at least50%. In this example, “cases” are individuals that would have theadverse event in response to the drug and “controls” are individuals whowould not have the adverse event when exposed to the drug. The risk thatan individual will experience the adverse event is determined bycomputing a score for the individual based on their genotypes at a setof associated loci, and then e.g. comparing the score to a thresholdvalue for a genetic test, where the threshold was determined by analysisof the PPV, NPV, sensitivity, specificity, etc., or some combinationthereof for a genetic test based on the scores of a case group and acontrol group. For example, individuals with a number of associatedalleles higher than a threshold value may be identified as being at highrisk of having the adverse event. Using a threshold value of 60 for theillustrative example values shown in Table 1 would eliminate 77% ofcases and 8% of controls. Since the incidence of adverse events is knownto be 4%, a patient population of 1000 would have ˜40 cases, about 31(77%) of which would have >60 associated alleles and about 9 of whichwould have ≦60 associated alleles. The same patient population wouldhave ˜960 controls, about 77 (8%) of which would have >60 associatedalleles and about 883 of which would have ≦60 associated alleles. Afterexcluding the 108 individuals with ≦60 associated alleles, the incidenceof an adverse event in the 892 individuals that were not excluded may becomputed: ( 9/892)×100=1%. The incidence of an adverse event in theindividuals that were excluded can be similarly computed: (31/108)×100=29%. Using the same computational methods, risk cutoffvalues of 59 and 61 were also evaluated as threshold values for thediagnostic test. A risk cutoff value of 59 resulted in a predictedincidence of the adverse event in the individuals that were not excludedfrom treatment of 1%, but more individuals in the control group wereexcluded (92), meaning that more individuals not at risk of the adverseevent would be denied treatment with the drug if this risk cutoff valuewere used as a threshold value in a diagnostic test. A risk cutoff valueof 61 resulted in a predicted incidence of the adverse event in theindividuals that were not excluded from treatment of 1.2%, which ishigher than that for a risk cutoff value of 60, however fewer controlindividuals were excluded (69), meaning that more individuals not atrisk of the adverse event would be able to benefit from the drugtreatment if this risk cutoff value were used as a threshold value in adiagnostic test. Further, if a practitioner wanted to maximize thenumber of controls that were treated while keeping the risk of adverseevents at or below 2% in the treated population, a threshold value of 69would exclude only 10 of the control individuals and would provide atreated population with a risk of the adverse event at 2%. Further, asshown in Table 2, the sensitivity need only be 0.53 for the test toidentify enough cases for removal from the group of patients to betreated to bring the risk of the adverse event down to 2%. Therefore,using the risk cutoff value of 69 as a threshold value in such adiagnostic test would decrease the incidence of adverse events in thepopulation of individuals treated by the particular therapeutic, therebyimproving its risk/benefit profile and allowing it to broaden its label,while maximizing the total number of individuals who are not at risk ofthe adverse event that will be included in the treatment. Clearly, thechoice of a particular risk of adverse events in the treated populationis an important factor is determining a threshold value for such adiagnostic test, and the determination of that level of risk must bedetermined by the clinician in concert with any regulatory agencies thatwould be involved in the approval of such a diagnostic (e.g. FDA). Forexample, if a 1% risk of adverse event was desired, a threshold value of60 could be chosen, which would increase the NPV of the test (therebyreducing the actual number of adverse events in the treated population)while sacrificing PPV (more individuals who could benefit (controls)would be excluded). Patients who are excluded could be treateddifferently, e.g. with a different drug, or could be given the drugalong with close monitoring for the adverse event, or with anothertreatment or agent that would counteract the adverse event. TABLE 2 RiskCutoff # Cases # Controls Values (out of 102) (out of 405) SensitivitySpecificity PPV NPV Accuracy 69 54 4 0.53 0.99 0.93 0.89 0.90 61 74 290.73 0.93 0.72 0.93 0.89 60 79 34 0.77 0.92 0.70 0.94 0.89 59 80 39 0.780.90 0.67 0.94 0.88 58 81 44 0.79 0.89 0.65 0.95 0.87 57 86 53 0.84 0.870.62 0.96 0.86 56 90 64 0.88 0.84 0.58 0.97 0.85 55 93 81 0.91 0.80 0.530.97 0.82

The concepts of sensitivity, specificity, PPV, NPV, accuracy, likelihoodratios, and ROC curves, and methods of choosing an appropriate thresholdvalue for a diagnostic test are widely used and well known to those ofskill in the art (see, for example, Janssens, et al. (2004) Am. J. Hum.Genet. 74:585-588; www.bamc.amedd.army.mil/DCI/articles/dci10972.htm;Baum M. (1995) Lancet 346:436-437; Forrest P. (1990) “Breast Cancer: thedecision to screen”; Nuffield Provincial Hospitals Trust; Morrison, A.S. (1985) “Screening in Chronic Disease” Oxford University Press Inc.USA; www.genome.gov/10002404;med.usd.edu/som/genetics/curriculum/11TEST7.htm; Bauman A. (1990)Australian Prescriber 13:62-64; Walker et al. (1986) Med. J. Aust.145:185-187; Gilbert R. (2001) Western J. Med. 174:405-409; Frohna, J.G. (2001) “Fostering the Efficient, Effective Use of Evidence-basedMedicine in the Clinic”, 2^(nd) edition, University of Michigan;Raglans, R. A. (2000) “Studying a Study and Testing a Test”, 4^(th)edition, Lippincott Williams & Wilkins;www.cebm.net/likelihood_ratios.asp; andwwwl.elsevier.com/gej-ng/10/22/71/52/140/article.html). For example, inone study the best threshold value for serum alpha-fetoprotein todiscriminate between liver cirrhosis and hepatocellular carcinoma wasevaluated based on the area under a ROC curve, likelihood ratios,sensitivity, specificity, PPV and NPV (Soresi et al. (2003) AnticancerRes. 23(2C): 1747-1753). In other study, mammography, sonography, and MRmammography were compared to determine if one or a combination of two ormore of these techniques would provide the best results for detection ofinvasive cancer and multifocal disease using the measures ofsensitivity, specificity, PPV, NPV and accuracy (Malur et al. (2001)Breast Cancer Res. 3:55-60). The combination of all three imagingtechniques led to the best results with a sensitivity of 0.994, aspecificity of 0.953, a PPV of 0.939, an NPV of 0.996 and an accuracy of0.97. In yet another study, the area under ROC curves for two clinicaltests was compared to determine whether one of the tests or acombination of both of the tests was most accurate at identifying theclass of a breast lesion (Buscombe et al. (2001) J. Nuc. Med.42(1):3-8). In another study, it was found that prostate-specificantigen (PSA) testing for detecting prostate cancer had a sensitivity of0.86 and a specificity of 0.33 for a cutoff of 4 ng/ml of PSA, but thatlowering the cutoff to 2 ng/ml of PSA increase the sensitivity to 0.95,but lowered the specificity to 0.20 (Hoffman, et al. (2002) BMC Fam.Pract. 3(1):19). Once all risk cutoff values are examined and theirrespective specificities, sensitivities, PPVs, NPVs, LR+ and LR− values,and accuracies (or some subset thereof) are calculated, an optimalbalance between these parameters, or some subset thereof, may be used inthe determination of a threshold value. One skilled in the art maychoose a threshold value that optimizes any of these measures, or acombination thereof, to achieve a clinically useful means of stratifyinga patient population for e.g. diagnosis, prognosis, pharmacogenomics,drug development, theranostics and the like.

In certain embodiments of the present invention, more than one thresholdvalue may be determined and used to classify an individual's risk ofexhibiting a multifactorial trait. In one such embodiment, a firstthreshold value chosen may be based on optimization for sensitivity,which will reduce the number individuals who are at high risk but arenot identified by the test (false negatives). Individuals that test“positive” in a genetic test using the first threshold value are thensubjected to the same genetic test using a second threshold value thatmay be based on optimization for specificity. This second thresholdvalue will reduce the number of individuals who test positive but whoare not really at high risk (false positives). Using two such thresholdvalues sequentially may serve to increase the accuracy of the method.

Another embodiment of the present invention in which more than onethreshold value is determined and used to classify an individual's riskof exhibiting a multifactorial trait is one in which a plurality ofthreshold values are used simultaneously in the same genetic test. Insuch a test, an individual's risk is determined based on which thresholdvalues the individual's score was greater than, less than, or equal to.In one embodiment at least about two thresholds are used, or at leastabout five thresholds are used or at least about 10 thresholds are used.In certain embodiments, every possible score for a given polygenic testis used as a threshold; in other embodiments a subset of possible scoresis used, wherein said subset may encompass a specific range of scores ormay include scores chosen from across the entire range of scores. Forexample, a first threshold may be chosen such that individuals that havea score higher than the first threshold are classified as highly likelyto develop a disease and are therefore treated with an appropriate drugto prevent onset. A second threshold may be chosen such that individualsthat have a score lower than the second threshold are classified ashaving a very low likelihood of developing the disease and are thereforenot treated to prevent onset. Those individuals with a score that isbetween the first and second thresholds may be classified as having anintermediate likelihood of developing the disease and may therefore betreated differently than individuals with a score higher than the firstthreshold or lower than the second threshold, e.g. they may not be giventhe drug but may be monitored more closely to detect onset of thedisease should it occur. The treatment of individuals with theintermediate risk may rely more heavily on other information, such asclinical information about the disease, polygenic test, drug, patient,etc., than does the treatment of individuals who do not have anintermediate risk (i.e. are at “high” or “low” risk).

Although a set of associated loci may be identified by an associationstudy, not all of the associated loci need be used in a single polygenictest. Once a set of associated loci is identified, one may adjust thenumber of associated loci to be used in a polygenic test and analyze thevalue of the test, e.g., with regards to its sensitivity, specificity,relative risk, likelihood ratio, PPV, NPV, accuracy, or a combinationthereof. For example, in certain embodiments, a high relative risk incombination with a high sensitivity is preferred. In one aspect, themethods of the present invention may be used to determine a subset(e.g., at least about 5, 10, 15, 20, 30 or 50) of associated loci to beused in a polygenic test. For example, the associated loci with thegreatest allele frequency differences between the case group and thecontrol group may be selected. In some embodiments, only those loci withallele frequency differences of at least about 8% (0.08), 10% (0.1), 15%(0.15), or 25% (0.25) are chosen for use in a polygenic test. In someembodiments, the subset of associated loci to be used in a polygenictest is determined by analyzing certain characteristics of the resultantpolygenic test using the genotyping data from the case and controlgroups. For example, sensitivity, specificity, relative risk, likelihoodratio, PPV, NPV, accuracy, or a combination thereof may be determinedfor a hypothetical polygenic test using a given subset of associatedloci. A plurality of such hypothetical polygenic tests may be analyzedin this manner and the subset of associated loci that in combinationresult in the polygenic test with the best combination of thesecharacteristics may be chosen. As in determining an appropriatethreshold value as discussed above, the best combination of sensitivity,specificity, relative risk, likelihood ratio, PPV, NPV, accuracy or asubset thereof for a polygenic test is dependent on many clinicalfactors including, e.g., the severity of the phenotype, the prevalenceof the phenotype, and other clinical information that ispopulation-specific or patient-specific. In certain embodiments, subsetof associated loci to be used in a polygenic test is determined based ona combination of the allele frequency differences for the associatedloci and the characteristics of the resulting polygenic test. Thus,using the methods of the present invention, one may predict thecharacteristics of a polygenic test using a subset of associated lociwithout performing a case-control study using only that subset tomeasure such characteristics.

This aspect of the present invention has important practicalimplications. For example, if certain associated loci do not replicatein a second validating association study, they may be removed from theset of associated loci to be used in a polygenic test, and thecharacteristics of the polygenic test without the “nonreplicating” locimay be determined without performing another association study. Further,a polygenic test that requires a large number of loci to be genotyped ismore expensive to perform than a polygenic test that requires a smallnumber of loci to be genotyped. Thus, the ability to reduce the numberof associated loci in a polygenic test while maintaining specificdesired characteristics (e.g., sensitivity, relative risk, etc.) for hasdirect implications for the affordability of performing such a test, andtherefore on the practical applicability of such a test.

Identification of Individuals at Risk of Developing a MultifactorialDisease

Once one or more threshold values have been determined, an individual(“test individual”) who is not a member of the case or control groupsmay be examined to determine the risk that the individual will developor exhibit the trait of interest. In certain embodiments of the presentinvention, the test individual is of the same species as the individualsin the case and control groups. The test individual is genotyped at eachof the associated SNP loci (or a subset thereof, as described above). Ascore is calculated for the test individual based on their genotype ateach of the SNP loci in the same manner as scores were calculated forthe individuals in the original case and control groups. In oneembodiment of the present invention, the calculated score for the testindividual is compared to one or more threshold values to determinewhether or not that individual is likely to exhibit the disease. Forexample, if a test individual has a score greater than a first thresholdvalue, it may be considered likely the test individual will develop orexhibit the disease, and if the test individual's score is equal to orless than a second threshold value, the test individual may beconsidered to be at low risk of developing the disease. The first andsecond threshold values may be the same or different values. Forexample, in an embodiment in which 55 is chosen as both the first andsecond threshold, then a test individual having a score greater than 55may be diagnosed as likely to develop the disease, and a test individualhaving a score of 55 or less may be diagnosed as unlikely to develop thedisease. Further, based on the prevalence of the disease and thesensitivity and specificity of the genetic test, one may calculate theprobability or likelihood that a person who is identified as at highrisk by the test actually has or will develop the disease (e.g.post-test odds, as discussed below). Likewise, one may calculate theprobability or likelihood that a person who is identified as at low riskby the test actually does not have and will not develop the disease.

In another embodiment of the present invention, a relative risk iscomputed for a test individual to further analyze the likelihood thatthe individual will develop or exhibit the disease. Relative risk is ameasure of how much a particular risk factor influences the risk of aspecified outcome. For example, a relative risk of 2 associated with arisk factor means that persons with that risk factor have a two-foldincreased risk of having a specified outcome than persons without thatrisk factor. In one aspect, a relative risk for a disease is afold-increase in risk relative to the risk of the trait (e.g. disease)in the general population. A relative risk is determined by calculatingthe ratio of the percentage of individuals in the case group to thepercentage of individuals in the control group that meet or exceed agiven score based on their genotypes at the set of SNPs that areassociated with the disease. Using the data presented in Table 1, forexample, the relative risk of an individual with a score of at least 65is (0.64)/(0.02)=32, which means that the individual has a 32-foldincreased risk of developing the disease based on their alleliccomposition at the associated SNP positions. To compare, the relativerisk of an individual with a score of at least 70 is (0.5)/(0.005)=100,which means that the individual has a 100-fold increased risk ofdeveloping the disease based on their allelic composition at theassociated SNP positions. In one aspect of the present invention, ascore is calculated for a test individual based on their genotypes atthe associated SNP loci, and the case and control groups are analyzed todetermine what percentage of the case individuals and what percentage ofthe control individuals have a score that is at least as great as thatof the test individual. Next, the percentage of case individuals with ascore at least as great as that of the test individual is divided by thepercentage of control individuals with a score at least as great as thatof the test individual to compute the relative risk for the testindividual.

As noted above, the relative risk provides a fold-increase in riskrelative to the risk of the disease in the general population.Therefore, to determine the test individual's risk of developing thedisease, the relative risk for the individual must be combined withclinical information regarding the prevalence of the disease. Forexample, if the disease has a prevalence of 1:100, then an individualwith a relative risk of 32 has a probability of developing the diseaseof 32:100, or 0.32. However, for a disease that has a prevalence of1:1,000,000, an individual with a relative risk of 32 has a probabilityof developing the disease of 32:1,000,000, or 0.000032. Thus, althoughthe relative risks were the same in these two examples, the actualprobability of developing the disease was very different for these twodiseases. In certain aspects of the present invention, a testindividual's risk of developing a multifactorial trait of interest iscalculated by multiplying the relative risk determined for theindividual by the prevalence of the multifactorial trait in the generalpopulation. Determination of relative risk is widely known and routinelyperformed by those of skill in the art (see Sackett, et al. (1991)Clinical Epidemiology: a basic science for clinical medicine (secondedition) Little Brown, Boston).

Further, the PPV and NPV of a genetic test can provide informationregarding the risk that an individual has or will develop a diseasebased on the test result. For example, if an individual tests “positive”for the disease using a test with a PPV of 0.87 and an NPV of 0.99, thenthe individual has an 87% chance of having or developing the disease.Likewise, if another individual tests “negative” for the disease usingthe same test, then that individual has only a 1% chance of having ordeveloping the disease.

Likelihood ratios use the sensitivity and specificity of a test toprovide a measure of how much a particular test result changes thelikelihood that a patient has or does not have a multifactorial trait ofinterest, as discussed above. The likelihood ratio (LR) of a positivetest result (LR+) is calculated as the sensitivity divided by(1−specificity), and the LR of a negative test result (LR−) iscalculated as (1−sensitivity) divided by the specificity. These LRvalues are multiplied by the pre-test odds to compute the post-testodds, which represents the chances that the individual has or willdevelop the multifactorial trait by incorporating information about thedisease prevalence, the patient pool, and specific patient risk factors(pre-test odds) and information about the diagnostic test itself (LR).The post-test odds may be used to compute the post-test probability bydividing the post-test odds by (1+post-test odds). For example, if anindividual who tests positive has a pre-test odds of one to 66 based ona prevalence of 1.5%, and the test has an LR+ of 6.6, then the post-testodds will be 0.1 and the post-test probability will be 0.09, meaningthat the individual has a 9% chance of having the disease. Similarly, ifan individual who tests negative has a pre-test odds of one to three andthe test has an LR− of 0.09, then the post-test odds will be 0.03,corresponding to a post-test probability of 3% that the individual hasthe disease. In this way, likelihood ratios and prevalence of themultifactorial trait may be used to calculate a probability that anindividual has or will develop a multifactorial trait of interest basedon a given test result.

Prognostic and Diagnostic Uses

Preventative measures are successful in preventing many differentdiseases, but these measures are only successful if individuals can beidentified as at risk of developing the disease before onset of thedisease. The onset of multifactorial diseases is especially difficult topredict due to the complex set of factors that influence theirdevelopment. As such, individuals often do not know they are at risk ofdeveloping a multifactorial disease until it is too late to prevent it.It will be clear to one of skill in the art that the methods presentedmay serve as valuable tools for clinicians in making medical decisionsregarding the care of their patients. The determination of risk is animportant aspect of the clinical analysis of an individual used todetermine whether or not medical interventions are warranted, and whichinterventions are most appropriate for a given individual (Bucher, etal. (1994) BMJ 309(6957):761-764; Forrow, et al. (1992) Am J Med92(2)121-124).

In certain embodiments, the present invention provides methods foridentifying individuals at risk of developing a disease (prognostics),thereby allowing implementation of measures to prevent or delay theonset of the disease. In one embodiment, an individual's risk ofdeveloping a given disease may be determined by comparing a score basedon the individual's genotype at a set of disease-associated SNPs to atleast one threshold value. If the individual's score exceeds a thresholdvalue, the institution of preventative measures (e.g., radiation or drugtherapies) may be justified. In another embodiment, an individual's riskof developing a disease may be determined by calculating a relative riskfor the individual and multiplying the relative risk by the prevalenceof the disease. In another embodiment, the sensitivity, specificity,PPV, NPV, and/or accuracy of a genetic test is used to calculate anindividual's risk of developing the disease. In yet another embodiment,the LR for the test is used to calculate the post-test odds/probabilitythat the individual will develop the disease. In another embodiment, acombination of the above-described methods are used to determine anindividual's risk of having or developing the disease. This informationmay be used by a clinician to better determine an appropriate treatmentregimen for the individual. Often, this information is used incombination with clinical information regarding the disease, thepatient, or the population from which the patient comes. In someaspects, the methods presented herein may also be used to identifyindividuals who are resistant to a disease. For example, someindividuals who have a family history of a disease (e.g., breast cancer)never develop the disease. This knowledge could better assess the riskof these individuals of developing the disease in question, providepeace of mind to those who are not at high risk, and in some cases wouldpreclude drastic prophylactic treatments (e.g., elective mastectomy).The methods presented herein may also be used to identify individualswith an increased risk of developing an adverse, non-disease conditionand thereby motivate life-style changes to prevent onset of thecondition. For example, a polygenic test comprising set of SNPsassociated with hypertension could provide strong incentive to those whoare found to be at high risk to exercise and eat a healthy diet.

Some diseases are difficult to diagnose based solely on the physicalsymptoms apparent in a patient. The diagnosis of these diseases is oftenconfounded by the variety of ways such a disease may manifest itself indifferent individuals, and/or the fact that its symptoms may be similarto those of a number of unrelated diseases. In a further aspect of thepresent invention, a set of SNPs associated with such a disease may beused to aid in the diagnosis of an individual who exhibits a phenotypethat may be indicative of the disease. Thus, genotyping the individualfor the set of associated SNPs and determining the individual's risk ofexhibiting the disease could either support or argue against thediagnosis suggested by the physical symptoms. If the diagnosis wassupported, a clinician could use this information to make treatmentdecisions for the individual, such as initiating a treatment regimen forthe disease. For example, celiac disease is an autoimmune disorder ofthe digestive system that damages the small intestine and interfereswith the absorption of nutrients from food. Specifically, celiac diseasecauses an inflammatory response in the small intestine in response togluten, a protein found in wheat, rye, and barley, and the onlytreatment for celiac disease is a gluten-free diet. It is difficult todiagnose celiac disease because different individuals display differentsymptoms. For example, some will have primarily gastrointestinalsymptoms such as distended abdomen or diarrhea, while others will haveonly irritability or depression. Further, the condition can be easilymisdiagnosed because its symptoms are similar to many other conditionsincluding irritable bowel syndrome, Crohn's disease, ulcerative colitis,diverticulosis, intestinal infections, chronic fatigue syndrome, anddepression. The methods presented herein may be used to identify a setof genetic loci associated with celiac disease, and these loci may beused to screen individuals who display symptoms indicative of celiacdisease. Those individuals who are found to be at high risk ofdeveloping celiac disease based on their genetic composition may bediagnosed as having celiac disease and placed on a gluten-free diet.

In other embodiments, the methods presented herein may be used to aid inthe determination of whether or not a prophylactic therapy is warrantedto prevent development of e.g. a disease in an individual. For example,there are approved therapeutics for prevention of breast cancer that aredependent on historical clinical information such as family history,onset of first menstrual period, number of children, etc. These factors,although useful for computing a pre-test odds, are only marginallypredictive of whether or not a woman will develop breast cancer. Agenetic test to be used in combination with the pre-test odds wouldprovide a far superior means of deciding whether or not to treat anindividual prophylactically (e.g. with tamoxifen) by providing a muchmore accurate way to identify and quantify her risk of developing breastcancer.

In one aspect of the present invention, a prognostic or diagnostic assayis provided comprising a nucleic acid array that contains probesdesigned to detect the presence of the set of associated SNPs in abiological sample. Nucleic acids are isolated from a biological samplefrom a test individual and are hybridized to the probes on the nucleicacid array. The probe intensities are analyzed to provide a genotype forthe test individual at each of the associated SNP positions. Thegenotypes are used to compute a score for the test individual, and theindividual's risk of developing the disease is determined according tothe methods presented herein.

The set of associated SNPs may further be used for identifying regionsof the genome that are involved in development of the disease phenotype.These SNPs may be directly involved in the manifestation of the disease,or they may be in linkage disequilibrium with loci that are directlyinvolved. For example, a disease-associated SNP may affect theexpression or function of a disease-associated protein directly, or maybe in linkage disequilibrium with another locus that affects theexpression or function of the protein. Examples of direct effects to theexpression or function of a protein include, but are not limited to, apolymorphism that alters the polypeptide sequence of the protein, and apolymorphism that occurs in a regulatory region (i.e., promoter,enhancer, etc.) resulting in the increased or decreased expression ofthe protein. In certain embodiments, genomic regions containing the setof associated SNPs are analyzed to identify genes that are directlyinvolved in the biological basis of the disease (“identified genes”).

The associated SNPs that lie in the coding region of a gene may be usedto detect or quantify expression of an associated allele in a biologicalspecimen for use as a diagnostic marker for the disease. For example,nucleic acids containing the associated SNPs may be used asoligonucleotide probes to monitor RNA or mRNA levels within the organismto be tested or a part thereof, such as a specific tissue or organ, soas to determine if the gene encoding the RNA or mRNA contains anassociated allele. In one aspect, a diagnostic or prognostic kit isprovided that comprises oligonucleotide probes for use in detecting anassociated allele in a biological sample. Likewise, if the associatedallele causes a change in the polypeptide sequence of the encodedprotein, the allelic constitution of the gene may be assayed at theprotein level using any customary technique such as immunologicalmethods (e.g., Western blots, radioimmune precipitation and the like) oractivity based assays measuring an activity associated with the geneproduct. In one aspect, a diagnostic or prognostic kit is provided thatcomprises an assay for detecting a polypeptide encoded by an associatedallele in a biological sample. The manner in which cells are probed forthe presence of particular nucleotide or polypeptide sequences is wellestablished in the literature and does not require further elaborationhere, however, see, e.g., Sambrook, et al., Molecular Cloning: ALaboratory Manual (Cold Spring Harbor Laboratory, New York) (2001).

Therapeutics

The set of associated SNPs may be useful for developing therapeutics forthe prevention of disease. In one aspect, the identified genes may beused for gene therapy. For example, if an identified gene is found to bedownregulated in individuals who exhibit the disease, then upregulationof the gene could be an effective strategy to prevent onset of thedisease in test individuals. Upregulation of the identified gene may beaccomplished by incorporating an allele of the gene that is notassociated with the disease into an expression vector and furtherintroducing the vector into an organism, thereby upregulating theexpression of the gene in the organism. Such vectors generally haveconvenient restriction sites located near the promoter sequence toprovide for the insertion of nucleic acid sequences in a recipientgenome. Transcription cassettes may be prepared comprising atranscription initiation region, the target gene or fragment thereof,and a transcriptional termination region. The transcription cassettesmay be introduced into a variety of vectors, e.g. plasmid; retrovirus,e.g. lentivirus; adenovirus; and the like, where the vectors are able tobe transiently or stably maintained in the cells. The gene or proteinproduct may be introduced directly into tissues or host cells by anynumber of routes, including viral infection, microinjection, or fusionof vesicles. Jet injection may also be used for intramuscularadministration, as described by Furth, et al., Anal. Biochem, 205:365-68 (1992). Alternatively, the DNA may be coated onto goldmicroparticles, and delivered intradermally by a particle bombardmentdevice or “gene gun” as described in the literature (see, for example,Tang, et al., Nature, 356: 152-54 (1992)).

Proteins encoded by the identified genes may be targets for antibodytherapy if there is an amino acid change in the sequence of the proteinthat is associated with the a predisposition to the disease. Forexample, if an associated allele encodes a protein variant that is acausative factor for the disease, antibodies specific for thedisease-associated protein variant may be administered to a patient as ameans to inhibit the development of the disease. In certain embodiments,a combination of antibodies, each specific for a differentdisease-associated protein, may be administered to a patient to preventonset of a disease.

Antisense molecules may be used to down-regulate expression of anassociated allele of an identified gene in cells. An antisense moleculeforms a duplex with the mRNA encoded by an allele of a gene, therebydown-regulating its expression and blocking translation of thecorresponding protein. For example, an antisense reagent may bedeveloped based on the sequence of the mRNA encoded by an associatedallele. This antisense agent may then be administered to a heterozygouspatient (possesses one associated allele and one allele that is notassociated with the disease) to decrease the expression of theassociated allele, allowing the expression of the unassociated allele topredominate. The antisense reagent may be antisense oligonucleotides,particularly synthetic antisense oligonucleotides having chemicalmodifications, or nucleic acid constructs that express such antisensemolecules as RNA. A combination of antisense molecules may beadministered, where a combination may comprise multiple differentsequences.

As an alternative to antisense inhibitors, catalytic nucleic acidcompounds, e.g., ribozymes, anti-sense conjugates, etc., may be used toinhibit expression of associated alleles. Ribozymes may be synthesizedin vitro and administered to the patient, or may be encoded on anexpression vector, from which the ribozyme is synthesized in thetargeted cell (for example, see International patent application WO9523225, and Beigelman, et al., Nucl. Acids Res. 23: 4434-42 (1995)).Examples of oligonucleotides with catalytic activity are described in WO9506764. Conjugates of antisense oligonucleotides with a metal complex,e.g. terpyridylCu(II), capable of mediating mRNA hydrolysis aredescribed in Bashkin, et al., Appl. Biochem. Biotechnol. 54: 43-56(1995).

An expressed protein encoded by an identified gene may be used in drugscreening assays to identify ligands or substrates that bind to,modulate or mimic the action of that protein product, and therebyidentify therapeutic agents to provide, for example, a replacement orenhancement for protein function in affected cells, or an agent thatmodulates or negates protein function. A wide variety of assays may beused for this purpose, including labeled in vitro protein-proteinbinding assays, protein-DNA binding assays, electrophoretic mobilityshift assays, immunoassays for protein binding, and the like. The term“agent” as used herein describes any molecule, e.g., a protein or smallmolecule, with the capability of altering, mimicking or masking, eitherdirectly or indirectly, the physiological function of an identified geneor gene product. Generally pluralities of assays are run in parallelwith different concentrations of the agent to obtain a differentialresponse to the various concentrations. Typically, one of theseconcentrations serves as a negative control, e.g., at zero concentrationor below the level of detection. Also, all or a fragment of a purifiedprotein variant may be used for determination of three-dimensionalcrystal structure, which can be used for determining the biologicalfunction of the protein or a part thereof, modeling intermolecularinteractions, membrane fusion, etc.

Candidate agents encompass numerous chemical classes, though typicallythey are organic molecules or complexes, preferably small organiccompounds, having a molecular weight of more than 50 and less than about2,500 daltons. Candidate agents comprise functional groups necessary forstructural interaction with proteins, particularly hydrogen bonding, andtypically include at least an amine, carbonyl, hydroxyl or carboxylgroup, and frequently at least two of the functional chemical groups.The candidate agents often comprise cyclical carbon or heterocyclicstructures and/or aromatic or polyaromatic structures substituted withone or more of the above functional groups. Candidate agents are alsofound among biomolecules including, but not limited to: peptides,saccharides, fatty acids, steroids, purines, pyrimidines, derivatives,structural analogs or combinations thereof.

Candidate agents are obtained from a wide variety of sources includinglibraries of synthetic or natural compounds. For example, numerous meansare available for random and directed synthesis of a wide variety oforganic compounds and biomolecules, including expression of randomizedoligonucleotides and oligopeptides. Alternatively, libraries of naturalcompounds in the form of bacterial, fungal, plant and animal extractsare available or readily produced. Additionally, natural orsynthetically produced libraries and compounds are readily modifiedthrough conventional chemical, physical and biochemical means, and maybe used to produce combinatorial libraries. Known pharmacological agentsmay be subjected to directed or random chemical modifications, such asacylation, alkylation, esterification, amidification, etc., to producestructural analogs.

Where the screening assay is a binding assay, one or more of themolecules may be coupled to a label, where the label can directly orindirectly provide a detectable signal. Various labels includeradioisotopes, fluorescers, chemiluminescers, enzymes, specific bindingmolecules, particles, e.g., magnetic particles, and the like. Specificbinding molecules include pairs, such as biotin and streptavidin,digoxin and antidigoxin, etc. For the specific binding members, thecomplementary member would normally be labeled with a molecule thatprovides for detection, in accordance with known procedures. A varietyof other reagents may be included in the screening assay. These includereagents like salts, neutral proteins, e.g., albumin, detergents, etcthat are used to facilitate optimal protein-protein binding and/orreduce non-specific or background interactions. Reagents that improvethe efficiency of the assay, such as protease inhibitors, nucleaseinhibitors, anti-microbial agents, etc., may be used.

Agents may be combined with a pharmaceutically acceptable carrier ordiluent, including any and all solvents, dispersion media, coatings,anti-oxidant, isotonic and absorption delaying agents and the like. Theagent may be combined with conventional additives, such as lactose,mannitol, corn starch or potato starch; with binders, such ascrystalline cellulose, cellulose derivatives, acacia, corn starch orgelatins; with disintegrators, such as corn starch, potato starch orsodium carboxymethylcellulose; with lubricants, such as talc ormagnesium stearate; and if desired, with buffering agents, moisteningagents, preservatives and flavoring agents. The use of such media andagents for pharmaceutically active substances is well known in the artand are readily available to the public. Moreover, pharmaceuticallyacceptable auxiliary substances, such as pH adjusting and bufferingagents, tonicity adjusting agents, stabilizers, wetting agents and thelike, are readily available to the public. Except insofar as anyconventional media or agent is incompatible with the active ingredient,its use in the therapeutic compositions and methods described herein iscontemplated. Supplementary active ingredients can also be incorporatedinto the compositions.

The following methods and excipients are merely exemplary and are in noway limiting. Identified agents of the invention can be incorporatedinto a variety of formulations for therapeutic administration. Moreparticularly, the complexes can be formulated into pharmaceuticalcompositions by combination with appropriate, pharmaceuticallyacceptable carriers or diluents as discussed supra, and may beformulated into preparations in solid, semi-solid, liquid or gaseousforms, such as tablets, capsules, powders, granules, ointments,solutions, gels, microspheres, and aerosols. Additionally, agents may beformulated into preparations for injections by dissolving, suspending oremulsifying them in an aqueous or nonaqueous solvent, such as vegetableor other similar oils, synthetic aliphatic acid glycerides, esters ofhigher aliphatic acids or propylene glycol; and if desired, withconventional additives such as solubilizers, isotonic agents, suspendingagents, emulsifying agents, stabilizers and preservatives. Further,agents may be utilized in aerosol formulation to be administered viainhalation. The agents identified by the methods presented herein can beformulated into pressurized acceptable propellants such asdichlorodifluoromethane, propane, nitrogen and the like. Alternatively,agents may be made into suppositories for rectal administration bymixing with a variety of bases such as emulsifying bases orwater-soluble bases and can include vehicles such as cocoa butter,carbowaxes and polyethylene glycols, which melt at body temperature, yetare solid at room temperature.

Implants for sustained release formulations are well known in the art.Implants are formulated as microspheres, slabs, etc. with biodegradableor non-biodegradable polymers. For example, polymers of lactic acidand/or glycolic acid form an erodible polymer that is well-tolerated bythe host. The implant containing identified agents may be placed inproximity to the site of action, so that the local concentration ofactive agent is increased relative to the rest of the body. Unit dosageforms for oral or rectal administration such as syrups, elixirs, andsuspensions may be provided wherein each dosage unit, for example,teaspoonful, tablespoonful, gel capsule, tablet or suppository, containsa predetermined amount of the compositions of the present invention.Similarly, unit dosage forms for injection or intravenous administrationmay comprise the compound of the present invention in a composition as asolution in sterile water, normal saline or another pharmaceuticallyacceptable carrier. The specifications for the novel unit dosage formsdepend on the particular compound employed and the effect to beachieved, and the pharmacodynamics associated with each active agent inthe host.

Administration of the agents can be achieved in various ways. Theformulation may be given orally, by inhalation, or may be injected, e.g.intravascular, intratumor, subcutaneous, intraperitoneal, intramuscular,etc. Agents may be topical, systemic, or may be localized by the use ofan implant that acts to retain the active dose at the site ofimplantation. The dosage of the therapeutic formulation will vary,depending on the specific agent and formulation utilized, the nature ofthe disease, the frequency of administration, the manner ofadministration, the clearance of the agent from the host, and the like,such that it is sufficient to address the disease or symptoms thereof,while minimizing side effects. In some cases, oral administration willrequire a different dose than if administered intravenously. Thecompounds will be administered at an effective dosage such that over asuitable period of time the disease progression may be substantiallyarrested. The initial dose may be larger, followed by smallermaintenance doses. The dose may be administered as infrequently as once,weekly or biweekly, or fractionated into smaller doses and administereddaily, semi-weekly, etc., to maintain an effective dosage level.Treatment may be for short periods of time, e.g., after ventricularfibrillation, or for extended periods of time, e.g., in the preventionof further episodes of ventricular fibrillation. It is contemplated thatthe composition will be obtained and used under the guidance of aphysician for in vivo use.

Pharmacogenomics

In other embodiments, the set of associated SNPs identified by themethods of the present invention are used for pharmacogenomics and drugdevelopment. Due to the great number of treatment options available forcommon multifactorial diseases, it is often difficult to determine whichof a group of treatment options will be most effective for a givenpatient. Typically, several different options must be tried before oneis found that is safe and effective. In the meantime, the patient willcontinue to suffer the effects of the disease, and perhaps will alsoexperience adverse events in response to one or more of the treatmentoptions tested. The methods presented herein are useful for stratifyingpatient populations prior to initiation of a treatment regimen.Polymorphic loci are identified that are associated with the response ofa patient to a drug or other medical treatment. The response may be anadverse event or may be related to the efficacy of the treatment. Theassociated loci are used to screen patient populations to generategenetic profiles relating to the associated loci for the patients thatwill help clinicians determine which individuals should be given thedrug or medical treatment and which should not. For example, individualswho are predisposed to exhibiting an adverse event and individuals whoare unlikely to have an efficacious response to a drug may be excludedfrom treatment with that drug, and may instead be treated by alternatemeans (different drug or other medical treatment).

In one such embodiment, individuals are screened for a set of SNPs thatare associated with a disease that confers a known risk of an adverseresponse to a particular drug treatment. Those individuals at high riskof developing the disease are excluded from the treatment regimen. Forexample, individuals with LQTS (long QT syndrome) have a high risk ofventricular fibrillation when administered antiarrhythmia drugs. Itwould be beneficial to screen a patient population for a set of lociassociated with LQTS prior to administering such a drug, and to excludethose individuals at high risk of developing LQTS. The set of SNPsassociated with the disease is determined by performing an associationstudy, and an individual's risk of developing the disease is performedas described above. A high risk of developing the disease may beconsidered a risk factor for adverse events in response to anantiarrhythmia drug and this information may be used by a clinician todetermine appropriate treatment options for the individual. For example,if the individual has a high risk of developing the disease, thenadministration of the drug may be avoided. If the individual has a lowrisk of developing the disease, then administration of the drug may be aviable treatment option.

In another embodiment of the present invention, the effectiveness of adrug treatment regimen is predicted for an individual based on thegenotypes of the individual at a set of SNPs associated with efficacy ofthe drug. This information is used to determine a probability of whetherthe drug will be an effective treatment for the individual, or if otherdrugs or treatment options should be considered instead. For example, anassociation study may be performed using a case group of individualsthat do not have an efficacious response to the drug (“nonresponders”)and a control group of individuals that have an efficacious response(“responders”). Members of the case and control groups are genotyped ata plurality of SNP positions, relative allele frequencies are computedfor each of the SNPs, and a set of SNPs associated with an efficaciousresponse is identified as those SNPs that have allele frequencydifferences that are significantly different between the case andcontrol groups. A score is calculated for each member of the case andcontrol groups based on their genotypes at the associated SNPs, andthese scores are used to determine one or more appropriate thresholdvalues for a genetic test that will predict the risk of an individualnot having an efficacious response to the drug. The determination of anappropriate threshold value may also include one or more of thefollowing: clinical knowledge of the drug, the indication being treated,and the patient population, and calculation of sensitivity, specificity,PPV, NPV, accuracy, LR+ and LR− of the genetic test. An individual whois a candidate for receiving the drug is genotyped at each of theassociated SNP positions, and a score is calculated for the individualbased on his/her genotypes at the set of associated SNPs. If theindividual has a score that is greater than a threshold value, theindividual may be classified as likely to be a nonresponder, andalternative treatments may be considered. If the individual has a scoreequal to or less than a threshold value, the individual may beclassified as likely to be a responder and administration of the drugmay be recommended. In another embodiment, an individual's risk of beinga nonresponder may be determined by calculating a relative risk for theindividual and multiplying the relative risk by the prevalence ofnonresponders based on the known efficacy of the drug. In anotherembodiment, the individual's likelihood of being a responder is computedusing the accuracy, LR+, LR−, PPV and/or NPV of the polygenic test. Thisinformation can then be used by a clinician in deciding on appropriatetreatments for the individual.

In a related embodiment, a diagnostic may be developed for a therapeuticarea to enable a clinician to better individualize treatment ofpatients. Rather than focusing on a single drug, the therapeutic areadiagnostic would provide information on the likelihood that a patientwill be a responder for a series of drugs related to a singletherapeutic area. For example, there are a multitude of drugs on themarket for treating depression including SSRIs (selective serotoninreuptake inhibitors), TCAs (tricyclic antidepressants), MAOIs (monoamineoxidase inhibitors), and triazolopyridines. Association studies may beperformed to identify polymorphic loci associated with the efficacy ofeach of these types of drugs, and those loci could then be used toscreen patient populations to determine which class of drugs would bemost efficacious for a given individual. For each drug, a case groupcomprises individuals with depression that had an efficacious responseto the drug, and a control group comprises individuals that did not havean efficacious response to the drug. Associated SNPs are identified asthose that have a significantly different allele frequency in the casesthan in the controls. For each class of drug, thresholds are determinedthat will identify individuals with a high (e.g. >80%, or >90% or >95%,or >98%) chance of having an efficacious response. An individual in needof antidepressant therapy is screened for the SNPs that are associatedwith each of the drug types, and a clinician determines an appropriatetherapy choice for the individual based on the individual's genotypeinformation and the thresholds determined for each class of drug.

In a further related embodiment, SNPs associated with the efficacy of adrug may be used to improve the efficacy of the drug by stratifyingpatient populations to exclude probable nonresponders from treatment. Inone example, ˜32% of patients exposed to a drug are classified asresponders. An association study is performed with a case group ofresponders and a control group of nonresponders, and 25 SNPs are foundto be associated with the responder phenotype. Based on the scorescalculated for the cases and controls it is found that 81% of respondersand 40% of nonresponders have a score of >19. Therefore, using 19 as athreshold value to stratify a patient population prior to administeringthe drug improves the overall efficacy of the drug from ˜32% to ˜50%. Indoing so, the number of nonresponders exposed to the drug is decreasedsubstantially, and those excluded may then be treated with alternativetherapies sooner. A change in efficacy of this magnitude could help toget a new drug approved, or could encourage wider use of an alreadyapproved drug.

In yet another embodiment, the methods presented herein may be used toassess whether a brand name drug should be used, or if a cheaper genericmay be substituted instead. For example, an association study would beperformed to identify genetic loci associated with a positive clinicalresponse to the generic alternative. Patients in need of treatment wouldthen be genotyped at these associated loci and a score would becalculated. The individual's score would then be used to predict theefficacy of the generic drug in the individual, and a clinician woulduse this information to make a treatment decision for the individual.This application of the disclosed methods could be used for medicalcosts reimbursement decisions, as well. For example, if it was foundthat the generic drug was unlikely to be efficacious in individual A,then the brand name drug would be administered to A and the cost of thebrand name drug could be reimbursed to A; however, if individual B waslikely to have an efficacious response to the generic, then individual Bwould not be given the more expensive brand name drug, and only the costof the generic would be reimbursable.

In another embodiment of the present invention, the risk that anindividual will experience an adverse event in response toadministration of a drug is determined based on the genotypes of theindividual at a set of SNPs associated with the occurrence of adverseevents related to the drug. If an individual is found to have a highrisk of experiencing an adverse event in response to a treatmentregimen, then the treatment regimen may be avoided and other treatmentoptions may be considered. For example, an association study may beperformed using a case group of individuals that exhibited an adverseevent in response to the drug and a control group of individuals thatdid not exhibit the adverse event. Members of the case and controlgroups are genotyped at a plurality of SNP positions, relative allelefrequencies are computed for each of the SNPs, and SNPs associated withthe adverse event are identified as those SNPs that have allelefrequency differences that are significantly different between the caseand control groups. A score is calculated for each member of the caseand control groups based on their genotypes at the associated SNPs, andthese scores are used to determine one or more appropriate thresholdvalues for a polygenic test that will predict the risk that anindividual will experience an adverse event in response to the drug withappropriate levels of sensitivity, specificity, PPV, NPV, LR+, LR−and/or accuracy. As discussed above, the selection of a threshold valuemay also be based on clinical factors, such as the severity of theadverse event, the disease or disorder being treated, and the medicalhistory of the individual being treated. For example, if the adverseevent is death, then a high sensitivity is essential to identify thoseindividuals who have a high probability of dying if administered thedrug. Prior to receiving the drug, an individual is genotyped at each ofthe associated SNP positions, and a score is calculated for theindividual based on his/her genotypes at the set of associated SNPs. Forexample, if the individual has a score that is greater than a thresholdvalue determined from the scores of the case and control groups, theindividual may be classified as likely to experience an adverse event ifadministered the drug, and use of the drug may be avoided. If theindividual has a score equal to or less than a threshold value, theindividual may be classified as not likely to suffer an adverse eventand administration of the drug may be recommended. If the individual hasa score less than or equal to one threshold value and greater thananother threshold value, the individual may be classified as having anintermediate likelihood of experiencing an adverse event and alternativedrug therapies may be used, or the drug may be administered e.g. onlywith close monitoring, or in combination with another therapeutic tocounteract the adverse event. Determination of the best treatmentregimen for an individual with an intermediate risk of experiencing anadverse event may rely more heavily on other information (e.g. clinicaldata, FDA or patient input, etc.) than does determination of the besttreatment regimen for an individual with a very high or low risk. Inanother embodiment, an individual's risk of experiencing an adverseevent may be determined by calculating a relative risk for theindividual and multiplying the relative risk by the known prevalence ofindividuals experiencing adverse events. This information can then beused by a clinician in deciding on appropriate treatments for theindividual. Adverse events in response to administration of a druginclude, but are not limited to, allergic reactions, cardiac arrhythmia,stroke, bronchospasm, gastrointestinal disturbances, fainting,impotency, rashes, fever, muscle pain, headaches, nausea, birth defects,hot flashes, mood changes, dizziness, agitation, vomiting, sleepdisturbance, somnolence, insomnia, addiction to the drug, and death.

In a related embodiment, SNPs associated with the safety of a drug maybe used to improve the safety of the drug by stratifying patientpopulations to exclude from treatment those individuals likely toexhibit an adverse event in response to administration of the drug. Inone example, a new drug is found to have excellent efficacy, toleranceand convenience, however 4% of individuals treated with the drugexperience a severe adverse event, and this incidence of adverse eventshas limited the use of the drug, e.g. to only those individuals for whomother therapies have failed. However, a regulatory agency has stipulatedthat if the incidence of the adverse event were lowered by at least 50%then the drug could be approved for wider usage. This could be achievedif individuals who are likely to experience the adverse event could beidentified prior to treatment, so an association study is performed witha case group of individuals that experienced the adverse event and acontrol group of individuals that did not to identify a set of 20 SNPsassociated with the adverse event. Results from the association studyare presented in Table 3 with the risk cutoff values shown in the firstcolumn, the % of cases with scores greater than the corresponding riskcutoff value in the second column, the % of controls with scores greaterthan the corresponding risk cutoff value in the third column, therelative risk in the fourth column, the percent sensitivity in the fifthcolumn, the percent specificity in the sixth column, the PPV (as apercent) in the seventh column, and the NPV (as a percent) in the eighthcolumn. TABLE 3 Risk Cutoff Value % Cases % Controls Relative riskSensitivity Specificity PPV NPV 20 40.0% 2.8% 14.2 40.0% 97.2% 37.3%97.5% 19 51.6% 5.6% 9.2 51.6% 94.4% 27.7% 97.9% 18 58.0% 9.9% 5.4 58.0%90.1% 19.6% 98.1% 16 75.0% 28.5% 2.5 75.0% 71.5% 9.9% 98.6% 15 91.2%39.8% 2.3 91.2% 60.2% 8.7% 99.4%

Using these values, it is found that using 19 as a threshold value wouldeliminate approximately 51.6% of the patients at highest risk for theadverse event while only eliminating 5.6% of those who could benefitfrom the drug. Therefore, if 1000 subjects were screened using 19 as thethreshold value and assuming that 4% of them are at high risk ofexperiencing the adverse event, 74[(1000)(0.04)(0.516)+(1000)(0.96)(0.056)] would be excluded and theremaining 926 could be treated. The risk of adverse events to thosetreated would therefore be [(1000)(0.04)(1−0.516)/926=0.02], or 2%.Thus, using 19 as a threshold value in a diagnostic to stratify patientpopulations prior to administering the drug would reduce the incidenceof adverse events from 4% to 2%, thereby qualifying the drug for widerusage. Similarly, the 18 could also be used as a threshold value, whichwould exclude 23/1000 individuals and would result in an expectedincidence of adverse events in the treated individuals of 1.9%. However,this decrease in incidence of adverse events is coupled with a decreasein both the specificity and the PPV for the test. The selection of anappropriate risk/benefit diagnostic threshold value may require not onlyinformation about the test itself (specificity, sensitivity, PPV, NPV,etc.), but also interaction between the practitioner of the methodspresented herein an a regulatory agency (e.g. FDA) and judgment based onclinical utility. The goal of such a pharmacogenomics test would be tomaximize the NPV (reduce the incidence of the adverse event in thosetreated) while balancing the PPV (minimizing the exclusion of patientswho could benefit from the drug). The use of the methods describedherein to reduce the frequency of adverse events could help to get a newdrug approved, or could encourage the wider use of an already approveddrug. For example, by coupling such a diagnostic to a drug it may bepossible to reduce the frequency of adverse events to levels that arecommercially acceptable, in effect rescuing a drug that would otherwisenot have been approved.

It will be clear to those of skill in the art that an appropriatethreshold value for approval of a diagnostic to be coupled to a drug islargely dependent on negotiations between a drug sponsor (e.g. apharmaceutical company) and the regulatory authorities (e.g. F.D.A.).This is the case whether the diagnostic is for improving the efficacy orsafety of a drug. For example, although the frequency of adverse eventsis lowered to 2% in the example above, the regulatory authorities mayrequire a more stringent safety level, and therefore a lower thresholdvalue to identify individuals to exclude from treatment with the drug,thereby sacrificing PPV for a higher NPV.

In certain aspects, the present invention provides greatly improvedmethods for determining an individual's risk of developing or exhibitinga multifactorial trait. In certain aspects, the methods are further usedto develop prognostics, diagnostics, or therapeutics for amultifactorial disease. In other aspects, the methods are further usedto predict drug response in individuals prior to administration of atherapeutic regimen. The methods presented herein may further help toreduce the overall cost of medical treatment by providing a means toquickly find the right medical intervention (most efficacious, safest,cheapest, etc.) for an individual so that precious time and money arenot misspent on therapies of limited value. It is to be understood thatthe above description is intended to be illustrative and notrestrictive. It readily should be apparent to one skilled in the artthat various embodiments and modifications may be made to the inventiondisclosed in this application without departing from the scope andspirit of the invention. The scope of the invention should, therefore,be determined not with reference to the above description, but shouldinstead be determined with reference to the appended claims, along withthe full scope of equivalents to which such claims are entitled. Allpublications mentioned herein are cited for the purpose of describingand disclosing reagents, methodologies and concepts that may be used inconnection with the present invention. Nothing herein is to be construedas an admission that these references are prior art in relation to theinventions described herein. Throughout the disclosure various patents,patent applications and publications are referenced. Unless otherwiseindicated, each is incorporated by reference in its entirety for allpurposes.

1. A method for predicting the effectiveness of a drug treatment regimenin an individual comprising a) determining a genotype for saidindividual at a plurality of biallelic polymorphic loci, wherein each ofsaid plurality has an associated allele and an unassociated allele, andfurther wherein the genotype is selected from the group consisting ofhomozygous for the associated allele, heterozygous, and homozygous foran unassociated allele; b) computing a score for said individual basedon said genotype determined in a); and c) performing a comparison of thescore to at least one threshold value, wherein said comparison is usedto predict the effectiveness of said drug treatment regimen in saidindividual.
 2. The method of claim 1, further comprising identifying theassociated alleles and the unassociated alleles for said plurality ofbiallelic polymorphic loci by performing an association study with acase group for whom said drug treatment regimen is not effective and acontrol group for whom said drug treatment regimen is effective, therebydetermining a set of alleles of said polymorphic loci that aresignificantly more abundant in the case group than the control group,wherein said set of alleles or a subset thereof are the associatedalleles.
 3. The method of claim 2, wherein at least one of said case andsaid control group comprises at least 200 individuals.
 4. The method ofclaim 2, wherein said case and said control groups are matched prior toperforming the association study.
 5. The method of claim 2, wherein saidperforming an association study further comprises a) genotyping saidcase group and said control group at a set of polymorphic loci thatcomprises said plurality of biallelic polymorphic loci; b) calculating arelative allele frequency for each of said set of polymorphic loci foreach of said case group and said control group; c) for each of said setof polymorphic loci, comparing the relative allele frequency calculatedfor the case group with the relative allele frequency calculated for thecontrol group, thereby identifying a subset of said set of polymorphicloci, wherein each of said subset has a relative allele frequency thatis significantly different for the case group than for the controlgroup; and d) determining an allele for each of said subset that is moreabundant in said case group than said control group, wherein said alleleis one of said associated alleles.
 6. The method of claim 5, wherein theset of polymorphic loci comprises at least about 500 polymorphic loci.7. The method of claim 5, wherein the set of polymorphic loci comprisespolymorphic loci from every chromosome In the genome of said individual.8. (canceled)
 9. (canceled)
 10. The method of claim 9, furthercomprising validating said associated alleles by performing a secondassociation study with said case group and said control group using anindividual genotyping methodology, thereby determining which of saidassociated alleles are significantly more abundant in said case groupthan said control group based on said second association study, whereinthose of said associated alleles that are significantly more abundant insaid case group than said control group based on said second associationstudy are the validated associated alleles.
 11. The method of claim 2,further comprising validating said associated alleles by performing asecond association study with a second case group for whom said drugtreatment regimen is not effective and a second control group for whomsaid drug treatment regimen is effective, thereby determining which ofsaid associated alleles are significantly more abundant in the secondcase group than the second control group, wherein those of saidassociated alleles that are significantly more abundant in the secondcase group than the second control group are the validated associatedalleles.
 12. The method of claim 2, further comprising determining a oneof said at least one threshold value by a method comprising a)calculating a score for each member of said case group and said controlgroup; b) selecting a series of risk cutoff values; c) computing a setof values for each of said series of risk cutoff values, wherein saidset of values comprises at least one of a sensitivity, a specificity, aPPV, an NPV, an accuracy, a relative risk, an LR+ and an LR−; d)choosing a one of said series of risk cutoff values as said one of saidat least one threshold value based on said set of values, therebydetermining said one of said at least one threshold value.
 13. Themethod of claim 12, wherein calculating a score for each member of saidcase group and said control group comprises a) determining a genotypefor said each member at said plurality of biallelic polymorphic loci,wherein the genotype is selected from the group consisting of homozygousfor an associated allele, heterozygous, and homozygous for anunassociated allele; b) assigning a value of zero to each of saidpolymorphic loci that has a genotype that is homozygous for an allelethat is not the associated allele; c) assigning a value of one to eachof said polymorphic loci that has a genotype that is heterozygous; d)assigning a value of two to each of said polymorphic loci that has agenotype that is homozygous for the associated allele; e) summing thevalues determined in steps a) through c) for all said polymorphic loci,thereby calculating a score for said each member of said case group andsaid control group.
 14. The method of claim 12, wherein said selecting aseries of risk cutoff values comprises identifying a highest score fromthe scores calculated for each member of said case group and saidcontrol group; determining a risk cutoff range, wherein the range isfrom 1 to said highest score; selecting a series of values from acrossthe risk cutoff range, thereby selecting said series of risk cutoffvalues.
 15. (canceled)
 16. The method of claim 12, wherein saiddetermining said one of said at least one threshold value furthercomprises using a ROC curve based on said sensitivity and saidspecificity computed in c), wherein a graphical representation of saidROC curve is referred to as a plot.
 17. The method of claim 16, furthercomprising choosing as said one of said at least one threshold value arisk cutoff value corresponding to a data point on said ROC curve thatis nearer an upper left corner of said plot than any other data point onsaid ROC curve, wherein each data point on said ROC curve corresponds toa different risk cutoff value.
 18. The method of claim 16, furthercomprising a) determining a location on said ROC curve that is nearestan upper left corner of said plot and to determine a sensitivity and aspecificity that correspond to said location; b) analyzing said scoresfor each member of said case group and said control group to identify arisk cutoff value whose sensitivity and specificity are nearest saidsensitivity and specificity that correspond to said location, whereinsaid risk cutoff value whose sensitivity and specificity are nearestsaid sensitivity and specificity that correspond to said location issaid one of said at least one threshold value.
 19. The method of claim12, wherein for a given risk cutoff value said relative risk is computedby a method comprising a) determining a percentage of sad members ofsaid case group that have a score that is at least as great as saidgiven risk cutoff value; b) determining a percentage of said members ofsaid control group that have a score that is at least as great as saidgiven risk cutoff value; and c) dividing said percentage determined ina) by said percentage determined in b) to compute said relative risk.20. (canceled)
 21. The method of claim 1, wherein if said individual'sscore is higher than said threshold value, said drug treatment regimenis predicted not to be effective in said individual and alternativetreatments are considered for said individual; and if said individual'sscore is equal to or less than said threshold value, said drug treatmentregimen is predicted to be effective and is administered to saidindividual.
 22. (canceled)
 23. The method of claim 1, wherein saidcomputing a score further comprises a) assigning a value of zero to eachof said polymorphic loci that has a genotype that is homozygous for anallele that is not the associated allele; b) assigning a value of one toeach of said polymorphic loci that has a genotype that is heterozygous;c) assigning a value of two to each of said polymorphic loci that has agenotype that is homozygous for the associated allele; d) summing thevalues determined in steps a) through c) for all of said polymorphicloci, thereby computing a score for said individual.
 24. A diagnostic orprognostic assay comprising nucleic acid probes designed to detect theassociated alleles of claim 1 in a biological sample.
 25. (canceled) 26.A genetic test for assessing an individual's likelihood of developing orexhibiting a multifactorial trait comprising a) a means for determininga genotype for said individual at a plurality of biallelic polymorphicloci, wherein each of said plurality has an associated allele and anunassociated allele, and further wherein the genotype is selected fromthe group consisting of homozygous for the associated allele,heterozygous, and homozygous for the unassociated allele; b) at leastone threshold value; and c) a comparison between a score computed forsaid individual based on said genotype determined in a) and said atleast one threshold value, wherein said comparison is used to assesssaid individual's likelihood of developing or exhibiting saidmultifactorial trait.
 27. The method of claim 26, wherein saidmultifactorial trait is a disease or a response to a drug. 28.(canceled)
 29. The method of claim 26, wherein said multifactorial traitis a lack of an efficacious response to said drug or an adverse eventcaused by the administration of said drug.
 30. The method of claim 29further comprising excluding said individual from treatment with saiddrug if said score is greater than a one of said at least one thresholdvalue.